Re: [HACKERS] CIC and deadlocks

2007-04-10 Thread Tom Lane
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> When I looked at the code, it occurred to me that possibly we are
> OK with just taking shared lock on the procarray. That means that
> some other transaction can concurrently set its serializable snapshot
> while we are scanning the procarray. But that should not harm us:
> if we see the snapshot set, we wait for the transaction. A transaction
> which is setting its serializable snapshot NOW, can not see the
> tuples that we did not index, isn't it ?

[ itch... ]  The problem is with time-extended execution of
GetSnapshotData; what happens if the other guy lost the CPU for a good
long time while in the middle of GetSnapshotData?  He might set his
xmin based on info you saw as long gone.

You might be correct that it's safe, but the argument would have to
hinge on the OldestXmin process being unable to commit because of
someone holding shared ProcArrayLock; a point you are definitely not
making above.  (Study the comments in GetSnapshotData for awhile,
also those in xact.c's commit-related code.)

I'm about to head to bed and am certainly in no condition to carry the
proof through.  Have at it ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [DOCS] uuid type not documented

2007-04-10 Thread Marko Kreen

On 4/10/07, Peter Eisentraut <[EMAIL PROTECTED]> wrote:

The problem is that most of the standard methods are platform dependent, as
they require MAC addresses or a "good" random source, for instance.


FYI: good random source is already available in pgcrypto,
it uses either OpenSSL RAND_bytes() or internal strong PRNG.

--
marko

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Question about SHM_QUEUE

2007-04-10 Thread Tom Lane
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> I have a question about SHM_QUEUE. Why do we need this component?

It's a hangover from Berkeley days that no one has felt a need to remove
yet.  The convention back then was that shared memory might be mapped to
different addresses in different processes.  We've since adopted the
assumption that everyone will see the same addresses, but we have not
made any attempt to eradicate the old approach everywhere.

> Then, can we replace SHM_QUEUE by a pointer-based double-linked list?
> It will be an "intrusive" version of Dllist.

What exactly will you gain by it?  I'm not inclined to fool with that
code for trivial reasons ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] CIC and deadlocks

2007-04-10 Thread Pavan Deolasee

On 4/1/07, Tom Lane <[EMAIL PROTECTED]> wrote:



Good point.  I'm envisioning a procarray.c function along the
lines of
bool TransactionHasSnapshot(xid)
which returns true if the xid is currently listed in PGPROC
and has a nonzero xmin.  CIC's cleanup wait loop would check
this and ignore the xid if it returns false.  Your point means
that this function would have to take exclusive not shared lock
while scanning the procarray, which is kind of annoying, but
it seems not fatal since CIC isn't done all that frequently.



When I looked at the code, it occurred to me that possibly we are
OK with just taking shared lock on the procarray. That means that
some other transaction can concurrently set its serializable snapshot
while we are scanning the procarray. But that should not harm us:
if we see the snapshot set, we wait for the transaction. A transaction
which is setting its serializable snapshot NOW, can not see the
tuples that we did not index, isn't it ?

A patch based on the discussion is attached.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com


CIC_deadlock.patch
Description: Binary data

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] prepared statements logging

2007-04-10 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, Apr 10, 2007 at 02:53:32AM -0700, marcofuics wrote:
> Hi *
> I am using the postgresql-8.2.3, with a jdbc-8.2-504 (the GeoNet
> webServer tool...) My question is :
> Is the <> server able to log the whole <> query?
> {made by a prepared statement}
> Looking at the log I can see only queries of the type:
> TIP 5: don't forget to increase your free space map settings

Wrong list. This is one dedicated for people working on the innards of
PostgreSQL. You'd want one of  or
<[EMAIL PROTECTED]>.

Now to your question: there are many configuration parameters in the
config file (possibly in /etc/postgresql/8.2/main/postgresql.conf or
some similar location, depending on your OS and distribution) to control
the level of logging. Look there for entries named log_xxx and
debug_xxx.

HTH
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFGHHpCBcgs9XrR2kYRAs4SAJ45gvCVcp4Avk4t7NgQ/SST0s35owCggdWQ
CcVc9UluRe4d/Gl8ERV7z4I=
=P8EJ
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Question about SHM_QUEUE

2007-04-10 Thread ITAGAKI Takahiro
Hello,

I have a question about SHM_QUEUE. Why do we need this component?

We've already made some modules under the assumption that the base offset
of shared memory is mapped to the same address for all processes.
See comment in freespace.h:

 * Note: we handle pointers to these items as pointers, not as SHMEM_OFFSETs.
 * This assumes that all processes accessing the map will have the shared
 * memory segment mapped at the same place in their address space.

Then, can we replace SHM_QUEUE by a pointer-based double-linked list?
It will be an "intrusive" version of Dllist.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]

2007-04-10 Thread Tom Lane
Kris Kennaway <[EMAIL PROTECTED]> writes:
> On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote:
>> Well, the thing is, we've pretty much had it handed to us that
>> current-command indicators that aren't up to date are not very useful.
>> So rate-limited updates strike me as a useless compromise.

> I don't get your argument - ps auxww is never going to be 100%
> up-to-date because during the time the command is running the status
> may change.

Of course.  But we have already done the update-once-every-half-second
bit --- that was how pg_stat_activity used to work --- and our users
made clear that it's not good enough.  So I don't see us expending
significant effort to convert the setproctitle code path to that
approach.  The clear way of the future for expensive-setproctitle
platforms is just to turn it off entirely and rely on the new
pg_stat_activity implementation.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 05:36:17PM -0400, Tom Lane wrote:
> Kris Kennaway <[EMAIL PROTECTED]> writes:
> > I have not studied the exact code path, but there are indeed multiple
> > wakeups happening from the semaphore code (as many as the number of
> > active postgresql processes).  It is easy to instrument
> > sleepq_broadcast() and log them when they happen.
> 
> There are certainly cases where Postgres will wake up a number of
> processes in quick succession, but that should happen from a separate
> semop() kernel call, on a different semaphore, for each such process.
> If there's really multiple processes being released by the same semop()
> then there's a bug we need to look into (or maybe it's a kernel bug?).
> Anyway I'd be interested to know what the test case is, and which PG
> version you were testing.

I used 8.2 (and some older version when I first noticed it a year ago)
and either sysbench or supersmack will show it - presumably anything
that makes simultaneous queries.  Just instrument sleepq_broadcast()
to e.g. log a KTR event when it wakes more than 1 process and you'll
see it happening.

Kris


pgptMLonITGtT.pgp
Description: PGP signature


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 02:46:56PM -0400, Tom Lane wrote:
> Kris Kennaway <[EMAIL PROTECTED]> writes:
> >>> Make SYSV semaphores less dumb about process wakeups.  Currently
> >>> whenever the semaphore state changes, all processes sleeping on the
> >>> semaphore are woken, even if we only have released enough resources
> >>> for one waiting process to claim.
> 
> >> Correct.  The behavior Kris describes is surely bad, but it's not
> >> relevant to Postgres' usage of SysV semaphores.
> 
> > Sorry, but the behaviour is real.
> 
> Oh, I'm sure the BSD kernel acts as you describe.  But Mark's point is
> that Postgres never has more than one process waiting on any particular
> SysV semaphore, and so the problem doesn't really affect us.
> 
> Or do you mean that the kernel wakes all processes sleeping on *any*
> SysV semaphore?  That would be nasty :-(

To be clear, some behaviour that postgresql does with sysv semaphores
causes wakeups of many processes at once.  i.e. if you have 20
clients, you will get up to 20 wakeups.  I haven't studied the precise
cause of this, but it is empirically true.  This is the scaling
problem I described, and it's what mux's patch addresses.

Kris


pgp00SdLk8acL.pgp
Description: PGP signature


Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]

2007-04-10 Thread Kris Kennaway
On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote:
> Kris Kennaway <[EMAIL PROTECTED]> writes:
> > I think the high number of setproctitle() calls are more problematic
> > to us at the moment, Kris can comment on that.
> 
> > Since we've basically had it handed to us that calling setproctitle()
> > thousands of times per second is something that real applications now
> > do, we're pretty much forced to work on making it cheaper.
> > ...
> > However this won't help all the existing systems out there (including
> > other affected OSes), so it would be great if you guys could meet us
> > half way and find a way to make postgresql rate-limit these calls by
> > default to some suitable compromise rate, like once/second or
> > whatever.
> 
> Well, the thing is, we've pretty much had it handed to us that
> current-command indicators that aren't up to date are not very useful.
> So rate-limited updates strike me as a useless compromise.  We have
> the "real" solution (status advertised in PG's shared memory) already,
> so the question in my mind is just how fast DBAs will wish to transition
> to looking at "select * from pg_stat_activity" instead of looking at
> "ps auxww".

I don't get your argument - ps auxww is never going to be 100%
up-to-date because during the time the command is running the status
may change.  So we already know that stats being a fraction of a
second out of date are acceptable to users, because that's what may
happen when you run ps in the present model.  So you can use this to
get away with limiting updates to e.g. 10/second and in practise no
users will notice the difference.

Updating thousands of times a second just on the off chance that an
admin may one day run ps is completely inefficient (and has a huge
overhead on non-Linux systems, so it's demonstrably not a sensible way
to do things), and to the extent that there is a problem to be solved
it isn't even really solving it anyway.

If there really are users who find 10 proctitle updates/second an
unacceptably low update rate, then tune for the default case and
provide an option to allow them to override the rate limit to whatever
update rate they find appropriate.

Kris


pgpReJUB51aNd.pgp
Description: PGP signature


Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]

2007-04-10 Thread Kris Kennaway
On Wed, Apr 11, 2007 at 12:50:06PM +1200, Mark Kirkwood wrote:
> Tom Lane wrote:
> 
> >
> >>I think the high number of setproctitle() calls are more problematic
> >>to us at the moment, Kris can comment on that.
> >
> >As of PG 8.2 it is possible to turn those off.  I don't think there's a
> >lot of enthusiasm for turning them off by default ... at least not yet.
> >But it might make sense to point out in the PG documentation that
> >update_process_title is particularly costly on platforms X, Y, and Z.
> >Do you know if this issue affects all the BSDen equally?
> >
> 
> 
> Might be good to turn off by default for the 8.2+ Postgresql versions in 
> the FreeBSD ports tree (looks like postgresql.conf.sample is being 
> patched anyway, so pretty easy to amend).

Yeah, we might end up doing this, but I consider it a workaround.

Kris


pgpBiA1MKrVDJ.pgp
Description: PGP signature


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 06:26:37PM -0400, Tom Lane wrote:
> Kris Kennaway <[EMAIL PROTECTED]> writes:
> > On Tue, Apr 10, 2007 at 05:36:17PM -0400, Tom Lane wrote:
> >> Anyway I'd be interested to know what the test case is, and which PG
> >> version you were testing.
> 
> > I used 8.2 (and some older version when I first noticed it a year ago)
> > and either sysbench or supersmack will show it - presumably anything
> > that makes simultaneous queries.  Just instrument sleepq_broadcast()
> > to e.g. log a KTR event when it wakes more than 1 process and you'll
> > see it happening.
> 
> Sorry, I'm not much of a BSD kernel hacker ... but sleepq_broadcast
> seems a rather generic name.  Is that called *only* from semop?

It's part of how wakeup() is implemented.

> I'm wondering if you are seeing simultaneous wakeup from some other
> cause --- sleep timeout being the obvious possibility.  We are aware
> of behaviors (search the PG lists for "context swap storm") where a
> number of backends will all fail to get a spinlock and do short usleep
> or select-timeout waits.  In this situation they'd all wake up at the
> next scheduler clock tick ...

Nope, it's not this.

Kris


pgpa4cQe39p9O.pgp
Description: PGP signature


Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 08:23:36PM -0400, Tom Lane wrote:

> > I think the high number of setproctitle() calls are more problematic
> > to us at the moment, Kris can comment on that.
> 
> As of PG 8.2 it is possible to turn those off.  I don't think there's a
> lot of enthusiasm for turning them off by default ... at least not yet.
> But it might make sense to point out in the PG documentation that
> update_process_title is particularly costly on platforms X, Y, and Z.
> Do you know if this issue affects all the BSDen equally?

It will likely affect them to some extent.  In fact the only platforms
it will not hurt on are those which have already jumped through
special hoops to make setproctitle() super-cheap.  I presume Linux is
in this category but don't know which others are, if any.

Since we've basically had it handed to us that calling setproctitle()
thousands of times per second is something that real applications now
do, we're pretty much forced to work on making it cheaper.  Hopefully
this is something that will be addressed over the next few months
(we're going to look at adding support for pages shared between libc
and kernel so this kind of thing can be done without requiring a
syscall).

However this won't help all the existing systems out there (including
other affected OSes), so it would be great if you guys could meet us
half way and find a way to make postgresql rate-limit these calls by
default to some suitable compromise rate, like once/second or
whatever.

Kris


pgpxMmw1cbEgN.pgp
Description: PGP signature


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 03:52:00PM -0400, Tom Lane wrote:
> Kris Kennaway <[EMAIL PROTECTED]> writes:
> > On Tue, Apr 10, 2007 at 02:46:56PM -0400, Tom Lane wrote:
> >> Oh, I'm sure the BSD kernel acts as you describe.  But Mark's point is
> >> that Postgres never has more than one process waiting on any particular
> >> SysV semaphore, and so the problem doesn't really affect us.
> 
> > To be clear, some behaviour that postgresql does with sysv semaphores
> > causes wakeups of many processes at once.  i.e. if you have 20
> > clients, you will get up to 20 wakeups.  I haven't studied the precise
> > cause of this, but it is empirically true.  This is the scaling
> > problem I described, and it's what mux's patch addresses.
> 
> [ shrug... ]  To the extent that that happens, it's Postgres' own issue,
> and no amount of kernel rejiggering will change it.  But I certainly
> have no objection to a patch that fixes the kernel behavior ...

As we've discussed before, by far the bigger issue with postgresql
performance on FreeBSD is the default setting of
update_process_titles=on.

Kris


pgpxNR2bN01jL.pgp
Description: PGP signature


Re: [HACKERS] Idle idea for a feature

2007-04-10 Thread NikhilS

Hi,

On 4/11/07, Guillaume Smet <[EMAIL PROTECTED]> wrote:


On 4/10/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> ISTM it'd be a good idea if it did, as "are there any incoming
> foreign keys" seems to be a question we constantly ask when solving
> update-performance problems, and there isn't any easy way to check for
> such.

On similar lines, maybe \d can also show the list of inheritors when

invoked on a parent.
e.g:
postgres=# \d parent
   Table "public.parent"
Column |  Type   | Modifiers
+-+---
a  | integer |
Indexes:
   "parent_a_key" UNIQUE, btree (a)
Inherited by: child

Regards,
Nikhils
--
EnterpriseDB   http://www.enterprisedb.com


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-10 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-04-10 kell 18:17, kirjutas Joshua D. Drake:
> > In terms of idle time for gzip and other command to archive WAL offline,
> > no difference in the environment was given other than the command to
> > archive.   My guess is because the user time is very large in gzip, it
> > has more chance for scheduler to give resource to other processes.   In
> > the case of cp, idle time is more than 30times longer than user time.
> > Pg_compresslog uses seven times longer idle time than user time.  On the
> > other hand, gzip uses less idle time than user time.   Considering the
> > total amount of user time, I think it's reasonable measure.
> > 
> > Again, in my proposal, it is not the issue to increase run time
> > performance.   Issue is to decrease the size of archive log to save the
> > storage.
> 
> Considering the relatively little amount of storage a transaction log
> takes, it would seem to me that the performance angle is more appropriate.

As I understand it it's not about transaction log but about write-ahead
log.

and the amount of data in WAL can become very important once you have to
keep standby servers in different physical locations (cities, countries
or continents) where channel throughput and cost comes into play.

With simple cp (scp/rsync) the amount of WAL data needing to be copied
is about 10x more than data collected by trigger based solutions
(Slony/pgQ). With pg_compresslog WAL-shipping seems to have roughly the
same amount and thus becomes a viable alternative again.

> Is it more efficient in other ways besides negligible tps? Possibly more
> efficient memory usage? Better restore times for a crashed system?

I think that TPS is more affected by number of writes than size of each
block written, so there is probably not that much to gain in TPS, except
perhaps from better disk cache usage. 

For me pg_compresslog seems to be a winner even if it just does not
degrade performance.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [DOCS] uuid type not documented

2007-04-10 Thread Roman Neuhauser
# [EMAIL PROTECTED] / 2007-04-10 15:49:08 -0400:
> Neil Conway wrote:
> >On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote:
> >  
> >>The problem is that most of the standard methods are platform dependent, 
> >>as they require MAC addresses or a "good" random source, for instance.
> >>
> >
> >http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php
> >
> >ISTM random() or similar sources is a sufficient PSRNG for the purposes
> >of UUID generation -- I can't see anything in the RFC that would
> >contradict that.
> >
> >  
> 
> How about we set up a contrib (I wish we'd fixed that) module with an 
> example function or two?

Thought I'd mention Ralph Engelschall's uuid library, comes with
a PostgreSQL binding:

http://www.ossp.org/pkg/lib/uuid/
http://cvs.ossp.org/fileview?f=ossp-pkg/uuid/pgsql/uuid.txt&v=1.1

I only played with it some time ago.

-- 
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE. http://bash.org/?255991

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]

2007-04-10 Thread Tom Lane
Kris Kennaway <[EMAIL PROTECTED]> writes:
> I think the high number of setproctitle() calls are more problematic
> to us at the moment, Kris can comment on that.

> Since we've basically had it handed to us that calling setproctitle()
> thousands of times per second is something that real applications now
> do, we're pretty much forced to work on making it cheaper.
> ...
> However this won't help all the existing systems out there (including
> other affected OSes), so it would be great if you guys could meet us
> half way and find a way to make postgresql rate-limit these calls by
> default to some suitable compromise rate, like once/second or
> whatever.

Well, the thing is, we've pretty much had it handed to us that
current-command indicators that aren't up to date are not very useful.
So rate-limited updates strike me as a useless compromise.  We have
the "real" solution (status advertised in PG's shared memory) already,
so the question in my mind is just how fast DBAs will wish to transition
to looking at "select * from pg_stat_activity" instead of looking at
"ps auxww".

I don't see anything wrong at all with making update_process_title
default to "off" in BSD-specific packaging of Postgres.  It's a harder
sell to turn it off by default everywhere, because of all them Linux
users for whom that's just taking away a convenient status viewing
method.  I think we might get there eventually, but we need a decent
interval to wean people away from the old method.

[ Disclaimer: I work for Red Hat, so am unlikely to favor doing anything
that is a loss on Linux.  But I do use and like other platforms too;
just don't happen to have any BSD in-house currently, unless you're
willing to count Darwin as BSD. ]

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-10 Thread Joshua D. Drake

> In terms of idle time for gzip and other command to archive WAL offline,
> no difference in the environment was given other than the command to
> archive.   My guess is because the user time is very large in gzip, it
> has more chance for scheduler to give resource to other processes.   In
> the case of cp, idle time is more than 30times longer than user time.
> Pg_compresslog uses seven times longer idle time than user time.  On the
> other hand, gzip uses less idle time than user time.   Considering the
> total amount of user time, I think it's reasonable measure.
> 
> Again, in my proposal, it is not the issue to increase run time
> performance.   Issue is to decrease the size of archive log to save the
> storage.

Considering the relatively little amount of storage a transaction log
takes, it would seem to me that the performance angle is more appropriate.

Is it more efficient in other ways besides negligible tps? Possibly more
efficient memory usage? Better restore times for a crashed system?

Sincerely,

Joshua D. Drake

> 
> Regards;
> 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-10 Thread Koichi Suzuki
Hi,

In the case below, we run DBT-2 benchmark for one hour to get the
measure.   Checkpoint occured three times (checkpoint interval was 20min).

For more information, when checkpoint interval is one hour, the amount
of the archived log size was as follows:
cp: 3.1GB
gzip:   1.5GB
pg_compresslog: 0.3GB

For both cases, database size was 12.7GB, relatively small.

As pointed out, if we don't run the checkpoint forever, the value for cp
will become close to that for pg_compresslog, but it is not practical.

The point here is, if we collect archive log with cp and the average
work load is a quarter of the full power, cp archiving will produce
about 0.8GB archive log per hour (for DBT-2 case, of course the size
depends on the nature of the transaction).   If we run the database
whole day, the amount of the archive log will be as large as database
itself.   After one week, archive log size gets seven times as large as
the database itself.   This is the point.   In production, such large
archive log will raise storage cost.   The purpose of the proposal is
not to improve the performance, but to decrease the size of archive log
to save necessary storage, preserving the same chance of recovery at the
crash recovery as full_page_writes=on.

Because of DBT-2 nature, it is not meaningful to compare the throuput
(databsae size determines the number of transactions to run).   Instead,
 I compared the throuput using pgbench.   These measures are: cp:
570tps, gzip:558tps, pg_compresslog: 574tps, negligible difference.

In terms of idle time for gzip and other command to archive WAL offline,
no difference in the environment was given other than the command to
archive.   My guess is because the user time is very large in gzip, it
has more chance for scheduler to give resource to other processes.   In
the case of cp, idle time is more than 30times longer than user time.
Pg_compresslog uses seven times longer idle time than user time.  On the
other hand, gzip uses less idle time than user time.   Considering the
total amount of user time, I think it's reasonable measure.

Again, in my proposal, it is not the issue to increase run time
performance.   Issue is to decrease the size of archive log to save the
storage.

Regards;


Tom Lane wrote:
> Koichi Suzuki <[EMAIL PROTECTED]> writes:
>> My proposal is to remove unnecessary full page writes (they are needed 
>> in crash recovery from inconsistent or partial writes) when we copy WAL 
>> to archive log and rebuilt them as a dummy when we restore from archive 
>> log.
>> ...
>> Benchmark: DBT-2
>> Database size: 120WH (12.3GB)
>> Total WAL size: 4.2GB (after 60min. run)
>> Elapsed time:
>>cp:120.6sec
>>gzip:  590.0sec
>>pg_compresslog: 79.4sec
>> Resultant archive log size:
>>cp: 4.2GB
>>gzip:   2.2GB
>>pg_compresslog: 0.3GB
>> Resource consumption:
>>cp:   user:   0.5sec system: 15.8sec idle:  16.9sec I/O wait: 87.7sec
>>gzip: user: 286.2sec system:  8.6sec idle: 260.5sec I/O wait: 36.0sec
>>pg_compresslog:
>>  user:   7.9sec system:  5.5sec idle:  37.8sec I/O wait: 28.4sec
> 
> What checkpoint settings were used to make this comparison?  I'm
> wondering whether much of the same benefit can't be bought at zero cost
> by increasing the checkpoint interval, because that translates directly
> to a reduction in the number of full-page images inserted into WAL.
> 
> Also, how much was the database run itself slowed down by the increased
> volume of WAL (due to duplicated information)?  It seems rather
> pointless to me to measure only the archiving effort without any
> consideration for the impact on the database server proper.
> 
>   regards, tom lane
> 
> PS: there's something fishy about the gzip numbers ... why all the idle
> time?
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 


-- 
Koichi Suzuki

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]

2007-04-10 Thread Mark Kirkwood

Tom Lane wrote:




I think the high number of setproctitle() calls are more problematic
to us at the moment, Kris can comment on that.


As of PG 8.2 it is possible to turn those off.  I don't think there's a
lot of enthusiasm for turning them off by default ... at least not yet.
But it might make sense to point out in the PG documentation that
update_process_title is particularly costly on platforms X, Y, and Z.
Do you know if this issue affects all the BSDen equally?




Might be good to turn off by default for the 8.2+ Postgresql versions in 
the FreeBSD ports tree (looks like postgresql.conf.sample is being 
patched anyway, so pretty easy to amend).


Cheers

Mark


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]

2007-04-10 Thread Tom Lane
Maxime Henrion <[EMAIL PROTECTED]> writes:
> Thanks for forwarding my mail, Kris!  To Tom: if you can get my mails
> to reach pgsql-hackers@ somehow that would be just great :-).

They'll get approved eventually, just like mine to the BSD lists will
get approved eventually ;-)

>> The only thing we could do to fix that from our end would be to use
>> a smaller sema-set size on *BSD platforms.  Is the overhead per sema set
>> small enough to make this a sane thing to do?  Will we be likely to
>> run into system limits on the number of sets?

> I'm not familiar enough with the PostgreSQL code to know what impact
> such a change could have, but since the problem is clearly on our
> side here, I would advise against doing changes in PostgreSQL that
> are likely to complicate the code for little gain.  We still didn't
> even fully measure how much the useless wakups cost us since we're
> running into other contention problems with my patch that removes
> those.  And, as you point out, there are complications ensuing with
> respect to system limits (we already ask users to bump them when
> they install PostgreSQL).

OK, it was just an off-the-cuff idea.

> I think the high number of setproctitle() calls are more problematic
> to us at the moment, Kris can comment on that.

As of PG 8.2 it is possible to turn those off.  I don't think there's a
lot of enthusiasm for turning them off by default ... at least not yet.
But it might make sense to point out in the PG documentation that
update_process_title is particularly costly on platforms X, Y, and Z.
Do you know if this issue affects all the BSDen equally?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]

2007-04-10 Thread Tom Lane
Kris Kennaway <[EMAIL PROTECTED]> forwards:
> Yes but there are still a lot of wakeups to be avoided in the current
> System V semaphore code.  More specifically, not only do we wakeup all
> the processes waiting on a single semaphore everytime something changes,
> but we also wakeup all processes waiting on *any* of the semaphore in
> the semaphore *set*, whatever the reason we're sleeping.

O ... *that's* the problem.  Ugh.  Although we have a separate
semaphore for each PG backend, they're grouped into semaphore sets
(I think 16 active semaphores per set).  So a wakeup intended for one
process would uselessly send up to 15 others through the semop code.

The only thing we could do to fix that from our end would be to use
a smaller sema-set size on *BSD platforms.  Is the overhead per sema set
small enough to make this a sane thing to do?  Will we be likely to
run into system limits on the number of sets?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Tom Lane
Kris Kennaway <[EMAIL PROTECTED]> writes:
> On Tue, Apr 10, 2007 at 05:36:17PM -0400, Tom Lane wrote:
>> Anyway I'd be interested to know what the test case is, and which PG
>> version you were testing.

> I used 8.2 (and some older version when I first noticed it a year ago)
> and either sysbench or supersmack will show it - presumably anything
> that makes simultaneous queries.  Just instrument sleepq_broadcast()
> to e.g. log a KTR event when it wakes more than 1 process and you'll
> see it happening.

Sorry, I'm not much of a BSD kernel hacker ... but sleepq_broadcast
seems a rather generic name.  Is that called *only* from semop?
I'm wondering if you are seeing simultaneous wakeup from some other
cause --- sleep timeout being the obvious possibility.  We are aware
of behaviors (search the PG lists for "context swap storm") where a
number of backends will all fail to get a spinlock and do short usleep
or select-timeout waits.  In this situation they'd all wake up at the
next scheduler clock tick ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Andrew - Supernews
On 2007-04-10, Tom Lane <[EMAIL PROTECTED]> wrote:
> Kris Kennaway <[EMAIL PROTECTED]> writes:
>> I have not studied the exact code path, but there are indeed multiple
>> wakeups happening from the semaphore code (as many as the number of
>> active postgresql processes).  It is easy to instrument
>> sleepq_broadcast() and log them when they happen.
>
> There are certainly cases where Postgres will wake up a number of
> processes in quick succession, but that should happen from a separate
> semop() kernel call, on a different semaphore, for each such process.
> If there's really multiple processes being released by the same semop()
> then there's a bug we need to look into (or maybe it's a kernel bug?).
> Anyway I'd be interested to know what the test case is, and which PG
> version you were testing.

This is a problem in FreeBSD, not specifically to do with postgres - the
granularity for SysV semaphore wakeups in FreeBSD-6.x and earlier is the
entire semaphore set, not just one specific semaphore within the set. I
explained that to Kris some weeks ago, and someone (mux) did a patch (to
FreeBSD, not pg) which was already mentioned in this discussion.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Tom Lane
Kris Kennaway <[EMAIL PROTECTED]> writes:
> I have not studied the exact code path, but there are indeed multiple
> wakeups happening from the semaphore code (as many as the number of
> active postgresql processes).  It is easy to instrument
> sleepq_broadcast() and log them when they happen.

There are certainly cases where Postgres will wake up a number of
processes in quick succession, but that should happen from a separate
semop() kernel call, on a different semaphore, for each such process.
If there's really multiple processes being released by the same semop()
then there's a bug we need to look into (or maybe it's a kernel bug?).
Anyway I'd be interested to know what the test case is, and which PG
version you were testing.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [DOCS] uuid type not documented

2007-04-10 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote:
>> The problem is that most of the standard methods are platform dependent, as 
>> they require MAC addresses or a "good" random source, for instance.

> http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php

> ISTM random() or similar sources is a sufficient PSRNG for the purposes
> of UUID generation -- I can't see anything in the RFC that would
> contradict that.

Doesn't seem like quite enough bits of uniqueness.

We could improve matters by incorporating the database's
pg_control.system_identifier into the UUID, substituting for the MAC
address we don't have a good way to get.  system_identifier is currently
determined by the system clock at initdb time (to gettimeofday
precision), so it would add at least some additional bits of
randomness...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Idle idea for a feature

2007-04-10 Thread Guillaume Smet

On 4/10/07, Tom Lane <[EMAIL PROTECTED]> wrote:

ISTM it'd be a good idea if it did, as "are there any incoming
foreign keys" seems to be a question we constantly ask when solving
update-performance problems, and there isn't any easy way to check for
such.


Sure. We wrote a stored proc to do that and it could be really nice to
have it by defaut in the table definition.

Something like:
Incoming foreign keys:
  "name_of_the_fkey" (column of this table) REFERENCED BY
table(columns of the other table)
could be nice.
Not sure about the title of the section. Another problem is that this
syntax is not based on valid SQL syntax.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [DOCS] uuid type not documented

2007-04-10 Thread Florian G. Pflug

Neil Conway wrote:

On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote:
The problem is that most of the standard methods are platform dependent, as 
they require MAC addresses or a "good" random source, for instance.


http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php

ISTM random() or similar sources is a sufficient PSRNG for the purposes
of UUID generation -- I can't see anything in the RFC that would
contradict that.


Maybe a short-term solution could be a UUID-generated function that
takes some kind of seed as a parameter. People not concerned about
collisons could just pass some random value, while others could use
the mac-address of the client or something similar.

greetings, Florian Pflug


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Tom Lane
Kris Kennaway <[EMAIL PROTECTED]> writes:
> On Tue, Apr 10, 2007 at 02:46:56PM -0400, Tom Lane wrote:
>> Oh, I'm sure the BSD kernel acts as you describe.  But Mark's point is
>> that Postgres never has more than one process waiting on any particular
>> SysV semaphore, and so the problem doesn't really affect us.

> To be clear, some behaviour that postgresql does with sysv semaphores
> causes wakeups of many processes at once.  i.e. if you have 20
> clients, you will get up to 20 wakeups.  I haven't studied the precise
> cause of this, but it is empirically true.  This is the scaling
> problem I described, and it's what mux's patch addresses.

[ shrug... ]  To the extent that that happens, it's Postgres' own issue,
and no amount of kernel rejiggering will change it.  But I certainly
have no objection to a patch that fixes the kernel behavior ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] "select ('{}'::text[])[1]" returns NULL -- is it correct?

2007-04-10 Thread Florian G. Pflug

Nikolay Samokhvalov wrote:

On 4/10/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"Nikolay Samokhvalov" <[EMAIL PROTECTED]> writes:
> I remember several cases when people (e.g. me :-) ) were spending some
> time trying to find an error in some pl/pgsql function and the reason
> lied in incorrect work with arrays (i.e. messages like "index is out
> of bounds" and "index cannot be negative number" would help, surely).

Well, if indexes *couldn't* be negative numbers then that might be
helpful, but they can.


Ooops :-) OK, my proposal is narrowing to very simple one: what about
triggering WARNINGs when user tries to access nonexistent element of
array?


Please don't ;-)
There are two sane options - return an error, or return NULL. Both are
sensible, and different programming languages make different choices.

The only reason for a WARNING would be a long-term plan to change the
existing behaviour. But this will cause lots of pain, for no real gain,
because no matter which behaviour you pick, there are always situations
where the other would be more convenient.

Just look at the mess PHP has created by altering fundamental aspects
of the language (4.4 -> 5.0).

greetings, Florian Pflug

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] [DOCS] uuid type not documented

2007-04-10 Thread Andrew Dunstan

Neil Conway wrote:

On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote:
  
The problem is that most of the standard methods are platform dependent, as 
they require MAC addresses or a "good" random source, for instance.



http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php

ISTM random() or similar sources is a sufficient PSRNG for the purposes
of UUID generation -- I can't see anything in the RFC that would
contradict that.

  


How about we set up a contrib (I wish we'd fixed that) module with an 
example function or two?


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [DOCS] uuid type not documented

2007-04-10 Thread Neil Conway
On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote:
> The problem is that most of the standard methods are platform dependent, as 
> they require MAC addresses or a "good" random source, for instance.

http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php

ISTM random() or similar sources is a sufficient PSRNG for the purposes
of UUID generation -- I can't see anything in the RFC that would
contradict that.

-Neil



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] "select ('{}'::text[])[1]" returns NULL -- is it correct?

2007-04-10 Thread Nikolay Samokhvalov

On 4/10/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"Nikolay Samokhvalov" <[EMAIL PROTECTED]> writes:
> I remember several cases when people (e.g. me :-) ) were spending some
> time trying to find an error in some pl/pgsql function and the reason
> lied in incorrect work with arrays (i.e. messages like "index is out
> of bounds" and "index cannot be negative number" would help, surely).

Well, if indexes *couldn't* be negative numbers then that might be
helpful, but they can.



Ooops :-) OK, my proposal is narrowing to very simple one: what about
triggering WARNINGs when user tries to access nonexistent element of
array?

--
Best regards,
Nikolay

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] "select ('{}'::text[])[1]" returns NULL -- is it correct?

2007-04-10 Thread Nikolay Samokhvalov

On 4/9/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"Nikolay Samokhvalov" <[EMAIL PROTECTED]> writes:
> As I can see here, when I ask for element that doesn't exist, the
> database returns NULL for me. Maybe it's well-known issue (and
> actually I understood this behaviour before), but strictly speaking it
> seems wrong for me: the database _knows_ that there is no element, so
> why NULL?
[...]

AFAIR it's always been like that, so changing it seems exceedingly
likely to break some peoples' applications.  It's not completely without
analogies in SQL, anyway: consider the behavior of INSERT when fewer
columns are provided than the table has.  Pretending that elements
outside the stored range of the array are null is not all that different
from silently adding nulls to a row-to-be-stored.


OK, I see.
But if I try to INSERT to column that doesn't exist in the table, I
have an error.
Why pg's arrays are designed so that postgres doesn't produce errors
for attempts to access nonexistent element of array? Why there is no
simple sanity check (SELECT (ARRAY[6,8])[-1] -- works w/o an error)? I
remember several cases when people (e.g. me :-) ) were spending some
time trying to find an error in some pl/pgsql function and the reason
lied in incorrect work with arrays (i.e. messages like "index is out
of bounds" and "index cannot be negative number" would help, surely).

--
Best regards,
Nikolay

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] prepared statements logging

2007-04-10 Thread marcofuics
Hi *
I am using the postgresql-8.2.3, with a jdbc-8.2-504 (the GeoNet
webServer tool...) My question is :
Is the <> server able to log the whole <> query?
{made by a prepared statement}
Looking at the log I can see only queries of the type:

(cut).

LOG:  execute : SELECT * FROM Metadata WHERE id=$1
DETAIL:  parameters: $1 = '28'
DEBUG:  parse : SELECT schemaId, createDate, lastChangeDate,
source, isTemplate, uuid FROM Metadata WHERE id = 28

(cut).

The strange is that my query has to search on a "Test vector data" and
in the whole log-file this word <> is not present!

My desire is to have a line where I can see:
SELECTbla..bla...bla   FROM   MetadataWHERE id = "...some
pattern"   [maybe islike some pattern]

Thx in advance.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 10:23:42AM -0400, Tom Lane wrote:
> Mark Kirkwood <[EMAIL PROTECTED]> writes:
> > Kris Kennaway wrote:
> >> If so, then your task is the following:
> >> 
> >> Make SYSV semaphores less dumb about process wakeups.  Currently
> >> whenever the semaphore state changes, all processes sleeping on the
> >> semaphore are woken, even if we only have released enough resources
> >> for one waiting process to claim.  i.e. there is a thundering herd
> >> wakeup situation which destroys performance at high loads.  Fixing
> >> this will involve replacing the wakeup() calls with appropriate
> >> amounts of wakeup_one().
> 
> > I'm forwarding this to the pgsql-hackers list so that folks more 
> > qualified than I can comment, but as I understand the way postgres 
> > implements locking each process has it *own* semaphore it waits on  - 
> > and who is waiting for what is controlled by an in (shared) memory hash 
> > of lock structs (access to these is controlled via platform Dependant 
> > spinlock code). So a given semaphore state change should only involve 
> > one process wakeup.
> 
> Correct.  The behavior Kris describes is surely bad, but it's not
> relevant to Postgres' usage of SysV semaphores.

Sorry, but the behaviour is real.

Kris

pgphJTqz6La4j.pgp
Description: PGP signature


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Kris Kennaway
On Tue, Apr 10, 2007 at 10:41:04PM +1200, Mark Kirkwood wrote:
> Kris Kennaway wrote:
> >If so, then your task is the following:
> >
> >Make SYSV semaphores less dumb about process wakeups.  Currently
> >whenever the semaphore state changes, all processes sleeping on the
> >semaphore are woken, even if we only have released enough resources
> >for one waiting process to claim.  i.e. there is a thundering herd
> >wakeup situation which destroys performance at high loads.  Fixing
> >this will involve replacing the wakeup() calls with appropriate
> >amounts of wakeup_one().
> 
> I'm forwarding this to the pgsql-hackers list so that folks more 
> qualified than I can comment, but as I understand the way postgres 
> implements locking each process has it *own* semaphore it waits on  - 
> and who is waiting for what is controlled by an in (shared) memory hash 
> of lock structs (access to these is controlled via platform Dependant 
> spinlock code). So a given semaphore state change should only involve 
> one process wakeup.

I have not studied the exact code path, but there are indeed multiple
wakeups happening from the semaphore code (as many as the number of
active postgresql processes).  It is easy to instrument
sleepq_broadcast() and log them when they happen.

Anyway mux@ fixed this some time ago, which indeed helped scaling for
traffic over a local domain socket (particularly at higher loads), but
I saw some anomalous results when using loopback TCP traffic.  I think
this is unrelated (in this situation TCP is highly contended, and it
is often the case that fixing one bottleneck can make a highly
contended situation perform worse, because you were effectively
serializing a bit before, and reducing the non-linear behaviour) but
am still investigating, so the patch has not yet been committed.

Kris


pgpDDvfRzeiGJ.pgp
Description: PGP signature


Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem

2007-04-10 Thread Tom Lane
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
> Here is another patch to fix never-ending loop in mdsync. I introduced
> a mdsync counter (cycle id) and cancel flags to fix the problem.

> The mdsync counter is incremented at the every beginning of mdsync().
> Each pending entry has a field assigned from the counter when it is
> newly inserted to pendingOpsTable. Only entries that have smaller counter
> values than the mdsync counter are fsync-ed in mdsync().

> Another change is to add a cancel flag in each pending entry. When a
> relation is dropped and bgwriter receives a forget-request, the corresponding
> entry is marked as dropped but we don't delete it at that time. Actual
> deletion is performed in the next fsync loop. We don't have to retry after
> AbsorbFsyncRequests() because entries are not removed outside of seqscan.

This patch looks fairly sane to me; I have a few small gripes about
coding style but that can be fixed while applying.  Heikki, you were
concerned about the cycle-ID idea; do you have any objection to this
patch?

> This patch can be applied to HEAD, 8.2 and 8.1 with a few hunks.

I don't think we should back-patch something that's a performance fix
for an extreme case, especially not when it's not been through any
extensive testing yet ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem

2007-04-10 Thread ITAGAKI Takahiro
(Sorry if you receive duplicate messages. I resend it since it was not
 delivered after a day.)


Here is another patch to fix never-ending loop in mdsync. I introduced
a mdsync counter (cycle id) and cancel flags to fix the problem.

The mdsync counter is incremented at the every beginning of mdsync().
Each pending entry has a field assigned from the counter when it is
newly inserted to pendingOpsTable. Only entries that have smaller counter
values than the mdsync counter are fsync-ed in mdsync().

Another change is to add a cancel flag in each pending entry. When a
relation is dropped and bgwriter receives a forget-request, the corresponding
entry is marked as dropped but we don't delete it at that time. Actual
deletion is performed in the next fsync loop. We don't have to retry after
AbsorbFsyncRequests() because entries are not removed outside of seqscan.

This patch can be applied to HEAD, 8.2 and 8.1 with a few hunks.


Tom Lane <[EMAIL PROTECTED]> wrote:
> > In my understanding from the discussion, we'd better to take "cycle ID"
> > approach instead of "making a copy of pendingOpsTable", because duplicated
> > table is hard to debug and requires us to pay attention not to leak 
> > memories.
> > I'll adopt the cycle ID approach and build LDC on it as a separate patch.
> 
> Heikki made some reasonable arguments against the cycle-ID idea.  I'm
> not intending to insist on it ...

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



fix_mdsync.patch
Description: Binary data

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] [Fwd: Index Advisor]

2007-04-10 Thread Gurjeet Singh

Hi Tom,

   The original patch was submitted by Kai Sattler, and we (at EDB) spent a
lot of time improving it, making it as seamless and as user-friendly as
possible. As is evident from the version number of the patch (v26), it has
gone through a lot of iterations, and was available to the community for
review and discussion (and discuss they did; they asked for a few things and
those were added/improved).


I am thinking the API needs to be simpified, perhaps by removing the system
table and having the recommendations just logged to the server logs.



This means that this very useful information (in log files) would need to be
passed through an intermediary or another tool developed to allow access to
this information. I think that having this available from a table would be
very nice.


   In the initial submission, the feature was a big piece of code embedded
inside the backend. It required a system table, did not show the new plan,
actually created index physically before re planning, and could not advise
for a running application (everything had to be manually EXPLAINed).

   I read through the thread titled "Index Tuning Features" that first
discussed the idea of an Index adviser for PG, and this patch also meets
quite a few requirements raised there.

   Here are a few of the good things about this patch as of now:

.) Loadable plugin. Develop your own plugin to do nifty things with the plan
generated by the planner. Just as the debugger is implemented; if no
plugin... no work to do...

.) No syntax change. Run your queries as they are and get the advice in the
advise_index table (or set client_min_messages = LOG, to see the improved
plan on the screen also, if any).

.) Can recommend indexes even for the generated dynamic-queries, that are
hard to regenerate in a dry-run.

.) Can recommend indexes for SQL being executed through plpgsql (or any PL)
(again, hard to regenerate the parameterized queries by hand), and the the
advice is available in the advise_index table.

.) The adviser dumps it's advice in a table named advise_index. That can be
a user table, or a view with INSERT rule, or anything else; it should just
be an INSERTable object, accessible to the executing user (as opposed to a
system table required by the original implementation, and hence a need for
initdb).

.) No need to modify the application in any way; just set PGOPTIONS
environment variable properly before executing the appln., and run it as
usual... you have the advice generated for you.

.) No need for DBA (or the appln. writer) to feed anything to the planner in
any way; the process of recommendation is fully automated (this may change
if another plugin implimentation requires the stats in some user table).

.) Does recommend multi-column indexes. Does not make a set of each
fathomable combination of table columns to develop multi-column indexes
(hence avoiding a combinatorial explosion of time-space requirements); it
uses the columns used in the query to generate multi-column indexes.

.) The indexes are not created on disk; the index-tuple-size calculation
function does a very good job of estimating the size of the virtual index.

.) The changes to the catalog are just for the backend running under the
adviser, no one else can see those virtual indexes (as opposed to the
earlier implementation where the indexes were created on-disk, and available
to all the backends in the planning phase).

   So, with one hook (no GUC variables!), we get all these cool things. I
tried very hard to eliminate that one leftover kludge, but couldn't (we have
two options here, and they are enclosed in '#if GLOBAL_CAND_LIST ... #else'
parts of the code; left upto the committers to decide which one we need!).

   Another kludge that I had to add was the SPI_connect() and SPI_finish()
frame around the savepoint handling, since the RollbackToSavepoint in
xact.cassumes that only a PL/* module must be using the savepoint
infrastucture
(this was discussed on -hackers).

   The interface etc. may not be beautiful, but it isn't ugly either! It is
a lot better than manually creating pg_index records and inserting them into
cache; we use index_create() API to create the index (build is deferred),
and then 'rollback to savepoint' to undo those changes when the advisor is
done. index_create() causes pg_depends entries too, so a 'RB to SP' is far
much safer than going and deleting cache records manually.

   I hope you would agree that we need two passes of planner, one without
v-indexes and the other with v-indexes, for the backend to compare the
costs, and recommend indexes only if the second plan turned out to be
cheaper. If we implement the way you have suggested, then we will need one
hook at the end of get_relation_info(), one in EXPLAIN code, and yet
another, someplace after planner is finished, to do the comparison of the
two plans and recommend only those indexes that were considered to be useful
by the planner. (A total of three places to touch).

Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Tom Lane
Kris Kennaway <[EMAIL PROTECTED]> writes:
>>> Make SYSV semaphores less dumb about process wakeups.  Currently
>>> whenever the semaphore state changes, all processes sleeping on the
>>> semaphore are woken, even if we only have released enough resources
>>> for one waiting process to claim.

>> Correct.  The behavior Kris describes is surely bad, but it's not
>> relevant to Postgres' usage of SysV semaphores.

> Sorry, but the behaviour is real.

Oh, I'm sure the BSD kernel acts as you describe.  But Mark's point is
that Postgres never has more than one process waiting on any particular
SysV semaphore, and so the problem doesn't really affect us.

Or do you mean that the kernel wakes all processes sleeping on *any*
SysV semaphore?  That would be nasty :-(

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem

2007-04-10 Thread Tom Lane
I wrote:
> This patch looks fairly sane to me; I have a few small gripes about
> coding style but that can be fixed while applying.  Heikki, you were
> concerned about the cycle-ID idea; do you have any objection to this
> patch?

Actually, on second look I think the key idea here is Takahiro-san's
introduction of a cancellation flag in the hashtable entries, to
replace the cases where AbsorbFsyncRequests can try to delete entries.

What that means is mdsync() doesn't need an outer retry loop at all:
the periodic AbsorbFsyncRequests calls are not a hazard, and retry of
FileSync failures can be handled as an inner loop on the single failing
table entry.  (We can make the failure counter a local variable, too,
instead of needing space in every hashtable entry.)

And with that change, it's no longer possible for an incoming stream
of fsync requests to keep mdsync from terminating.  It might fsync
more than it really needs to, but it won't repeat itself, and it must
reach the end of the hashtable eventually.  So we don't actually need
the cycle counter at all.

It might be worth having the cycle counter anyway just to avoid doing
"useless" fsync work.  I'm not sure about this.  If we have a cycle
counter of say 32 bits, then it's theoretically possible for an fsync
to fail 2^32 consecutive times and then be skipped on the next try,
allowing a checkpoint to succeed that should not have.  We can fix that
with a few more lines of logic to detect a wrapped-around value, but is
it worth the trouble?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-10 Thread Tom Lane
Koichi Suzuki <[EMAIL PROTECTED]> writes:
> My proposal is to remove unnecessary full page writes (they are needed 
> in crash recovery from inconsistent or partial writes) when we copy WAL 
> to archive log and rebuilt them as a dummy when we restore from archive 
> log.
> ...
> Benchmark: DBT-2
> Database size: 120WH (12.3GB)
> Total WAL size: 4.2GB (after 60min. run)
> Elapsed time:
>cp:120.6sec
>gzip:  590.0sec
>pg_compresslog: 79.4sec
> Resultant archive log size:
>cp: 4.2GB
>gzip:   2.2GB
>pg_compresslog: 0.3GB
> Resource consumption:
>cp:   user:   0.5sec system: 15.8sec idle:  16.9sec I/O wait: 87.7sec
>gzip: user: 286.2sec system:  8.6sec idle: 260.5sec I/O wait: 36.0sec
>pg_compresslog:
>  user:   7.9sec system:  5.5sec idle:  37.8sec I/O wait: 28.4sec

What checkpoint settings were used to make this comparison?  I'm
wondering whether much of the same benefit can't be bought at zero cost
by increasing the checkpoint interval, because that translates directly
to a reduction in the number of full-page images inserted into WAL.

Also, how much was the database run itself slowed down by the increased
volume of WAL (due to duplicated information)?  It seems rather
pointless to me to measure only the archiving effort without any
consideration for the impact on the database server proper.

regards, tom lane

PS: there's something fishy about the gzip numbers ... why all the idle
time?

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [DOCS] uuid type not documented

2007-04-10 Thread Tom Lane
"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> Peter Eisentraut wrote:
>> The problem is that most of the standard methods are platform dependent, as 
>> they require MAC addresses or a "good" random source, for instance.  I'm not
>> sure how we wanted to solve that, but certainly leaving the uuid type with 
>> *no* method to generate one is pretty poor.

> Actually, I would say that not having the ability to generate a UUID is 
> just fine. Most languages, have the ability to generate them per their 
> particular platforms. Let's leave it to them.

Let us *not* open that can of worms for 8.3.  This patch would not have
got in except that it didn't attempt to solve that problem, and there's
even less time available now.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [DOCS] uuid type not documented

2007-04-10 Thread Joshua D. Drake

Peter Eisentraut wrote:

Am Dienstag, 10. April 2007 17:30 schrieb Neil Conway:

On Tue, 2007-04-10 at 17:24 +0200, Peter Eisentraut wrote:

The new uuid type is lacking documentation.

We had also talked about including some UUID generation functionality in
8.3, but it should be okay to leave that for 8.4.


The problem is that most of the standard methods are platform dependent, as 
they require MAC addresses or a "good" random source, for instance.  I'm not 
sure how we wanted to solve that, but certainly leaving the uuid type with 
*no* method to generate one is pretty poor.


Actually, I would say that not having the ability to generate a UUID is 
just fine. Most languages, have the ability to generate them per their 
particular platforms. Let's leave it to them.


Sincerely,

Joshua D. Drake







--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [DOCS] uuid type not documented

2007-04-10 Thread Peter Eisentraut
Am Dienstag, 10. April 2007 17:30 schrieb Neil Conway:
> On Tue, 2007-04-10 at 17:24 +0200, Peter Eisentraut wrote:
> > The new uuid type is lacking documentation.
>
> We had also talked about including some UUID generation functionality in
> 8.3, but it should be okay to leave that for 8.4.

The problem is that most of the standard methods are platform dependent, as 
they require MAC addresses or a "good" random source, for instance.  I'm not 
sure how we wanted to solve that, but certainly leaving the uuid type with 
*no* method to generate one is pretty poor.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Group Commit

2007-04-10 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> I've refrained from spending time on group commit until the 
> commit-no-wait patch lands, because it's going to conflict anyway. I'm 
> starting to feel we should not try to rush group commit into 8.3, unless 
> it somehow falls out of the commit-no-wait patch by accident, given that 
> we're past feature freeze and coming up with a proper group commit 
> algorithm would need a lot of research and testing. Better do it for 8.4 
> with more time, we've got enough features on plate for 8.3 anyway.

It's possible that it *would* fall out of commit-no-wait, if we are
alert to the possibility of shaking the tree in the right direction ;-)
Otherwise I agree with waiting till 8.4 to deal with it.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] \da doesn't show result type

2007-04-10 Thread Peter Eisentraut
Am Dienstag, 10. April 2007 17:12 schrieb Magnus Hagander:
> > > Is there a reason that \da doesn't show the functions result type, like
> > > \df does?  I would find that information useful.

> Eh, but it does? As of
> http://archives.postgresql.org/pgsql-committers/2007-03/msg00138.php.

Ah, OK, one feature in 8.3 that I can use. :)

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] \da doesn't show result type

2007-04-10 Thread Magnus Hagander
On Tue, Apr 10, 2007 at 10:32:38AM -0400, Tom Lane wrote:
> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> > Is there a reason that \da doesn't show the functions result type, like \df 
> > does?  I would find that information useful.
> 
> +1, I've been annoyed by that too.

Eh, but it does? As of
http://archives.postgresql.org/pgsql-committers/2007-03/msg00138.php.

Or am I missing what you're asking for here? :-)

//Magnus


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Idle idea for a feature

2007-04-10 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
> one).  ISTM it'd be a good idea if it did, as "are there any incoming
> foreign keys" seems to be a question we constantly ask when solving
> update-performance problems, and there isn't any easy way to check for
> such.  I'm not real sure what the printout should look like, though.

Agreed.  Suggestion:

networx=> \d wdm_networx.loc_base_clin
 Table "wdm_networx.loc_base_clin"
   Column|  Type   | Modifiers 
-+-+---
 btable_id   | integer | not null
 loc_base_id | integer | not null
 clin| integer | not null
Indexes:
"loc_base_clin_pkey" PRIMARY KEY, btree (btable_id, loc_base_id, clin)
"btable_id_clin_unique" UNIQUE, btree (btable_id, clin)
Foreign-key constraints:
"loc_base_clin_btable_id_fkey" FOREIGN KEY (btable_id) REFERENCES 
wdm_networx.btables_ref(btable_id)
"loc_base_clin_clin_fkey" FOREIGN KEY (clin) REFERENCES 
wdm_networx.clin(clin)
"loc_base_clin_loc_base_id_fkey" FOREIGN KEY (loc_base_id) REFERENCES 
wdm_networx.loc_base_dscr(loc_base_id)

networx=> \d wdm_networx.loc_base_dscr
  Table "wdm_networx.loc_base_dscr"
 Column |  Type  | Modifiers 
++---
 loc_base_id| integer| not null
 description| character varying(254) | not null
 locations  | character varying(254) | 
 univ_mandatory | character varying(254) | 
 ent_mandatory  | character varying(254) | 
Indexes:
"loc_base_dscr_pkey" PRIMARY KEY, btree (loc_base_id)
Referenced by:
"loc_base_clin_loc_base_id_fkey" FOREIGN KEY (loc_base_id) BY 
wdm_networx.loc_base_clin(loc_base_id)
 /|\/|\
Referenced column(s) in *this* table <|  |---> 
column(s) in referencing table

Just my 2c.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] \da doesn't show result type

2007-04-10 Thread Tom Lane
Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Is there a reason that \da doesn't show the functions result type, like \df 
> does?  I would find that information useful.

+1, I've been annoyed by that too.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] TOASTing smaller things

2007-04-10 Thread Luke Lonergan
Hi Bruce,

How about these:

- Allow specification of TOAST size threshold in bytes on a per column basis
- Enable storage of columns in separate TOAST tables
- Enable use of multi-row compression method(s) for TOAST tables

- Luke


On 3/26/07 5:39 PM, "Bruce Momjian" <[EMAIL PROTECTED]> wrote:

> Luke Lonergan wrote:
>> I advocate the following:
>> 
>> - Enable specification of TOAST policy on a per column basis
>> 
>> As a first step, then:
>> 
>> - Enable vertical partitioning of tables using per-column specification of
>> storage policy.
>> 
> 
> How are these different from ALTER TABLE SET STORAGE?  They need to be
> more specific.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Tom Lane
Mark Kirkwood <[EMAIL PROTECTED]> writes:
> Kris Kennaway wrote:
>> If so, then your task is the following:
>> 
>> Make SYSV semaphores less dumb about process wakeups.  Currently
>> whenever the semaphore state changes, all processes sleeping on the
>> semaphore are woken, even if we only have released enough resources
>> for one waiting process to claim.  i.e. there is a thundering herd
>> wakeup situation which destroys performance at high loads.  Fixing
>> this will involve replacing the wakeup() calls with appropriate
>> amounts of wakeup_one().

> I'm forwarding this to the pgsql-hackers list so that folks more 
> qualified than I can comment, but as I understand the way postgres 
> implements locking each process has it *own* semaphore it waits on  - 
> and who is waiting for what is controlled by an in (shared) memory hash 
> of lock structs (access to these is controlled via platform Dependant 
> spinlock code). So a given semaphore state change should only involve 
> one process wakeup.

Correct.  The behavior Kris describes is surely bad, but it's not
relevant to Postgres' usage of SysV semaphores.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Idle idea for a feature

2007-04-10 Thread Tom Lane
psql's \d command tells you about outgoing foreign key constraints
(ie, ones referencing another table from this one).  It doesn't tell
you about incoming ones (ie, ones where another table references this
one).  ISTM it'd be a good idea if it did, as "are there any incoming
foreign keys" seems to be a question we constantly ask when solving
update-performance problems, and there isn't any easy way to check for
such.  I'm not real sure what the printout should look like, though.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] "select ('{}'::text[])[1]" returns NULL -- is it correct?

2007-04-10 Thread Tom Lane
"Nikolay Samokhvalov" <[EMAIL PROTECTED]> writes:
> I remember several cases when people (e.g. me :-) ) were spending some
> time trying to find an error in some pl/pgsql function and the reason
> lied in incorrect work with arrays (i.e. messages like "index is out
> of bounds" and "index cannot be negative number" would help, surely).

Well, if indexes *couldn't* be negative numbers then that might be
helpful, but they can.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] \da doesn't show result type

2007-04-10 Thread Peter Eisentraut
Is there a reason that \da doesn't show the functions result type, like \df 
does?  I would find that information useful.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Anyone interested in improving postgresql scaling?

2007-04-10 Thread Mark Kirkwood

Kris Kennaway wrote:

If so, then your task is the following:

Make SYSV semaphores less dumb about process wakeups.  Currently
whenever the semaphore state changes, all processes sleeping on the
semaphore are woken, even if we only have released enough resources
for one waiting process to claim.  i.e. there is a thundering herd
wakeup situation which destroys performance at high loads.  Fixing
this will involve replacing the wakeup() calls with appropriate
amounts of wakeup_one().


I'm forwarding this to the pgsql-hackers list so that folks more 
qualified than I can comment, but as I understand the way postgres 
implements locking each process has it *own* semaphore it waits on  - 
and who is waiting for what is controlled by an in (shared) memory hash 
of lock structs (access to these is controlled via platform Dependant 
spinlock code). So a given semaphore state change should only involve 
one process wakeup.


Cheers

Mark

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Group Commit

2007-04-10 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
I've been working on the patch to enhance our group commit behavior. The 
patch is a dirty hack at the moment, but I'm settled on the algorithm 
I'm going to use and I know the issues involved.


One question that just came to mind is whether Simon's no-commit-wait
patch doesn't fundamentally alter the context of discussion for this.
Aside from the prospect that people won't really care about group commit
if they can just use the periodic-WAL-sync approach, ISTM that one way
to get group commit is to just make everybody wait for the dedicated
WAL writer to write their commit record.  With a sufficiently short
delay between write/fsync attempts in the background process, won't
that net out at about the same place as a complicated group-commit
patch?


Possibly. To get efficient group commit there would need to be some kind 
of signaling between the WAL writer and normal backends. I think there 
is some in the patch, but I'm not sure if it gives efficient group 
commit. A constant delay will just give us something similar to 
commit_delay.


I've refrained from spending time on group commit until the 
commit-no-wait patch lands, because it's going to conflict anyway. I'm 
starting to feel we should not try to rush group commit into 8.3, unless 
it somehow falls out of the commit-no-wait patch by accident, given that 
we're past feature freeze and coming up with a proper group commit 
algorithm would need a lot of research and testing. Better do it for 8.4 
with more time, we've got enough features on plate for 8.3 anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Group Commit

2007-04-10 Thread Zeugswetter Andreas ADI SD

> > > I've been working on the patch to enhance our group commit
behavior. 
> > > The patch is a dirty hack at the moment, but I'm settled on the 
> > > algorithm I'm going to use and I know the issues involved.
> > 
> > One question that just came to mind is whether Simon's
no-commit-wait 
> > patch doesn't fundamentally alter the context of discussion for
this.
> > Aside from the prospect that people won't really care about group 
> > commit if they can just use the periodic-WAL-sync approach, ISTM
that 
> > one way to get group commit is to just make everybody wait for the 
> > dedicated WAL writer to write their commit record.

Yes good catch, I think we will want to merge the two.
But, you won't want to wait indefinitely, since imho the dedicated WAL
writer will primarily only want to write/flush full WAL pages. Maybe
flush half full WAL pages only after some longer timeout. But basically
this timeout should be longer than an individual backend is willing to
delay their commit.

> >  With a 
> > sufficiently short delay between write/fsync attempts in the 
> > background process, won't that net out at about the same place as a 
> > complicated group-commit patch?

I don't think we want the delay so short, or we won't get any grouped
writes.

I think what we could do is wait up to commit_delay for the
dedicated WAL writer to do it's work. If it did'nt do it until timeout
let the backend do the flushing itself.

> I think the big question is whether commit_delay is ever going to be
generally useful.

It is designed to allow a higher transaction/second rate on a constantly
WAL bottlenecked system, so I think it still has a use case. I think you
should not compare it to no-commit-wait from the feature side (only
implementation).

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq