Re: [HACKERS] CIC and deadlocks
"Pavan Deolasee" <[EMAIL PROTECTED]> writes: > When I looked at the code, it occurred to me that possibly we are > OK with just taking shared lock on the procarray. That means that > some other transaction can concurrently set its serializable snapshot > while we are scanning the procarray. But that should not harm us: > if we see the snapshot set, we wait for the transaction. A transaction > which is setting its serializable snapshot NOW, can not see the > tuples that we did not index, isn't it ? [ itch... ] The problem is with time-extended execution of GetSnapshotData; what happens if the other guy lost the CPU for a good long time while in the middle of GetSnapshotData? He might set his xmin based on info you saw as long gone. You might be correct that it's safe, but the argument would have to hinge on the OldestXmin process being unable to commit because of someone holding shared ProcArrayLock; a point you are definitely not making above. (Study the comments in GetSnapshotData for awhile, also those in xact.c's commit-related code.) I'm about to head to bed and am certainly in no condition to carry the proof through. Have at it ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [DOCS] uuid type not documented
On 4/10/07, Peter Eisentraut <[EMAIL PROTECTED]> wrote: The problem is that most of the standard methods are platform dependent, as they require MAC addresses or a "good" random source, for instance. FYI: good random source is already available in pgcrypto, it uses either OpenSSL RAND_bytes() or internal strong PRNG. -- marko ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Question about SHM_QUEUE
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > I have a question about SHM_QUEUE. Why do we need this component? It's a hangover from Berkeley days that no one has felt a need to remove yet. The convention back then was that shared memory might be mapped to different addresses in different processes. We've since adopted the assumption that everyone will see the same addresses, but we have not made any attempt to eradicate the old approach everywhere. > Then, can we replace SHM_QUEUE by a pointer-based double-linked list? > It will be an "intrusive" version of Dllist. What exactly will you gain by it? I'm not inclined to fool with that code for trivial reasons ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CIC and deadlocks
On 4/1/07, Tom Lane <[EMAIL PROTECTED]> wrote: Good point. I'm envisioning a procarray.c function along the lines of bool TransactionHasSnapshot(xid) which returns true if the xid is currently listed in PGPROC and has a nonzero xmin. CIC's cleanup wait loop would check this and ignore the xid if it returns false. Your point means that this function would have to take exclusive not shared lock while scanning the procarray, which is kind of annoying, but it seems not fatal since CIC isn't done all that frequently. When I looked at the code, it occurred to me that possibly we are OK with just taking shared lock on the procarray. That means that some other transaction can concurrently set its serializable snapshot while we are scanning the procarray. But that should not harm us: if we see the snapshot set, we wait for the transaction. A transaction which is setting its serializable snapshot NOW, can not see the tuples that we did not index, isn't it ? A patch based on the discussion is attached. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com CIC_deadlock.patch Description: Binary data ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] prepared statements logging
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Apr 10, 2007 at 02:53:32AM -0700, marcofuics wrote: > Hi * > I am using the postgresql-8.2.3, with a jdbc-8.2-504 (the GeoNet > webServer tool...) My question is : > Is the <> server able to log the whole <> query? > {made by a prepared statement} > Looking at the log I can see only queries of the type: > TIP 5: don't forget to increase your free space map settings Wrong list. This is one dedicated for people working on the innards of PostgreSQL. You'd want one of or <[EMAIL PROTECTED]>. Now to your question: there are many configuration parameters in the config file (possibly in /etc/postgresql/8.2/main/postgresql.conf or some similar location, depending on your OS and distribution) to control the level of logging. Look there for entries named log_xxx and debug_xxx. HTH - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFGHHpCBcgs9XrR2kYRAs4SAJ45gvCVcp4Avk4t7NgQ/SST0s35owCggdWQ CcVc9UluRe4d/Gl8ERV7z4I= =P8EJ -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Question about SHM_QUEUE
Hello, I have a question about SHM_QUEUE. Why do we need this component? We've already made some modules under the assumption that the base offset of shared memory is mapped to the same address for all processes. See comment in freespace.h: * Note: we handle pointers to these items as pointers, not as SHMEM_OFFSETs. * This assumes that all processes accessing the map will have the shared * memory segment mapped at the same place in their address space. Then, can we replace SHM_QUEUE by a pointer-based double-linked list? It will be an "intrusive" version of Dllist. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]
Kris Kennaway <[EMAIL PROTECTED]> writes: > On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote: >> Well, the thing is, we've pretty much had it handed to us that >> current-command indicators that aren't up to date are not very useful. >> So rate-limited updates strike me as a useless compromise. > I don't get your argument - ps auxww is never going to be 100% > up-to-date because during the time the command is running the status > may change. Of course. But we have already done the update-once-every-half-second bit --- that was how pg_stat_activity used to work --- and our users made clear that it's not good enough. So I don't see us expending significant effort to convert the setproctitle code path to that approach. The clear way of the future for expensive-setproctitle platforms is just to turn it off entirely and rely on the new pg_stat_activity implementation. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Anyone interested in improving postgresql scaling?
On Tue, Apr 10, 2007 at 05:36:17PM -0400, Tom Lane wrote: > Kris Kennaway <[EMAIL PROTECTED]> writes: > > I have not studied the exact code path, but there are indeed multiple > > wakeups happening from the semaphore code (as many as the number of > > active postgresql processes). It is easy to instrument > > sleepq_broadcast() and log them when they happen. > > There are certainly cases where Postgres will wake up a number of > processes in quick succession, but that should happen from a separate > semop() kernel call, on a different semaphore, for each such process. > If there's really multiple processes being released by the same semop() > then there's a bug we need to look into (or maybe it's a kernel bug?). > Anyway I'd be interested to know what the test case is, and which PG > version you were testing. I used 8.2 (and some older version when I first noticed it a year ago) and either sysbench or supersmack will show it - presumably anything that makes simultaneous queries. Just instrument sleepq_broadcast() to e.g. log a KTR event when it wakes more than 1 process and you'll see it happening. Kris pgptMLonITGtT.pgp Description: PGP signature
Re: [HACKERS] Anyone interested in improving postgresql scaling?
On Tue, Apr 10, 2007 at 02:46:56PM -0400, Tom Lane wrote: > Kris Kennaway <[EMAIL PROTECTED]> writes: > >>> Make SYSV semaphores less dumb about process wakeups. Currently > >>> whenever the semaphore state changes, all processes sleeping on the > >>> semaphore are woken, even if we only have released enough resources > >>> for one waiting process to claim. > > >> Correct. The behavior Kris describes is surely bad, but it's not > >> relevant to Postgres' usage of SysV semaphores. > > > Sorry, but the behaviour is real. > > Oh, I'm sure the BSD kernel acts as you describe. But Mark's point is > that Postgres never has more than one process waiting on any particular > SysV semaphore, and so the problem doesn't really affect us. > > Or do you mean that the kernel wakes all processes sleeping on *any* > SysV semaphore? That would be nasty :-( To be clear, some behaviour that postgresql does with sysv semaphores causes wakeups of many processes at once. i.e. if you have 20 clients, you will get up to 20 wakeups. I haven't studied the precise cause of this, but it is empirically true. This is the scaling problem I described, and it's what mux's patch addresses. Kris pgp00SdLk8acL.pgp Description: PGP signature
Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]
On Wed, Apr 11, 2007 at 01:03:50AM -0400, Tom Lane wrote: > Kris Kennaway <[EMAIL PROTECTED]> writes: > > I think the high number of setproctitle() calls are more problematic > > to us at the moment, Kris can comment on that. > > > Since we've basically had it handed to us that calling setproctitle() > > thousands of times per second is something that real applications now > > do, we're pretty much forced to work on making it cheaper. > > ... > > However this won't help all the existing systems out there (including > > other affected OSes), so it would be great if you guys could meet us > > half way and find a way to make postgresql rate-limit these calls by > > default to some suitable compromise rate, like once/second or > > whatever. > > Well, the thing is, we've pretty much had it handed to us that > current-command indicators that aren't up to date are not very useful. > So rate-limited updates strike me as a useless compromise. We have > the "real" solution (status advertised in PG's shared memory) already, > so the question in my mind is just how fast DBAs will wish to transition > to looking at "select * from pg_stat_activity" instead of looking at > "ps auxww". I don't get your argument - ps auxww is never going to be 100% up-to-date because during the time the command is running the status may change. So we already know that stats being a fraction of a second out of date are acceptable to users, because that's what may happen when you run ps in the present model. So you can use this to get away with limiting updates to e.g. 10/second and in practise no users will notice the difference. Updating thousands of times a second just on the off chance that an admin may one day run ps is completely inefficient (and has a huge overhead on non-Linux systems, so it's demonstrably not a sensible way to do things), and to the extent that there is a problem to be solved it isn't even really solving it anyway. If there really are users who find 10 proctitle updates/second an unacceptably low update rate, then tune for the default case and provide an option to allow them to override the rate limit to whatever update rate they find appropriate. Kris pgpReJUB51aNd.pgp Description: PGP signature
Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]
On Wed, Apr 11, 2007 at 12:50:06PM +1200, Mark Kirkwood wrote: > Tom Lane wrote: > > > > >>I think the high number of setproctitle() calls are more problematic > >>to us at the moment, Kris can comment on that. > > > >As of PG 8.2 it is possible to turn those off. I don't think there's a > >lot of enthusiasm for turning them off by default ... at least not yet. > >But it might make sense to point out in the PG documentation that > >update_process_title is particularly costly on platforms X, Y, and Z. > >Do you know if this issue affects all the BSDen equally? > > > > > Might be good to turn off by default for the 8.2+ Postgresql versions in > the FreeBSD ports tree (looks like postgresql.conf.sample is being > patched anyway, so pretty easy to amend). Yeah, we might end up doing this, but I consider it a workaround. Kris pgpBiA1MKrVDJ.pgp Description: PGP signature
Re: [HACKERS] Anyone interested in improving postgresql scaling?
On Tue, Apr 10, 2007 at 06:26:37PM -0400, Tom Lane wrote: > Kris Kennaway <[EMAIL PROTECTED]> writes: > > On Tue, Apr 10, 2007 at 05:36:17PM -0400, Tom Lane wrote: > >> Anyway I'd be interested to know what the test case is, and which PG > >> version you were testing. > > > I used 8.2 (and some older version when I first noticed it a year ago) > > and either sysbench or supersmack will show it - presumably anything > > that makes simultaneous queries. Just instrument sleepq_broadcast() > > to e.g. log a KTR event when it wakes more than 1 process and you'll > > see it happening. > > Sorry, I'm not much of a BSD kernel hacker ... but sleepq_broadcast > seems a rather generic name. Is that called *only* from semop? It's part of how wakeup() is implemented. > I'm wondering if you are seeing simultaneous wakeup from some other > cause --- sleep timeout being the obvious possibility. We are aware > of behaviors (search the PG lists for "context swap storm") where a > number of backends will all fail to get a spinlock and do short usleep > or select-timeout waits. In this situation they'd all wake up at the > next scheduler clock tick ... Nope, it's not this. Kris pgpa4cQe39p9O.pgp Description: PGP signature
Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]
On Tue, Apr 10, 2007 at 08:23:36PM -0400, Tom Lane wrote: > > I think the high number of setproctitle() calls are more problematic > > to us at the moment, Kris can comment on that. > > As of PG 8.2 it is possible to turn those off. I don't think there's a > lot of enthusiasm for turning them off by default ... at least not yet. > But it might make sense to point out in the PG documentation that > update_process_title is particularly costly on platforms X, Y, and Z. > Do you know if this issue affects all the BSDen equally? It will likely affect them to some extent. In fact the only platforms it will not hurt on are those which have already jumped through special hoops to make setproctitle() super-cheap. I presume Linux is in this category but don't know which others are, if any. Since we've basically had it handed to us that calling setproctitle() thousands of times per second is something that real applications now do, we're pretty much forced to work on making it cheaper. Hopefully this is something that will be addressed over the next few months (we're going to look at adding support for pages shared between libc and kernel so this kind of thing can be done without requiring a syscall). However this won't help all the existing systems out there (including other affected OSes), so it would be great if you guys could meet us half way and find a way to make postgresql rate-limit these calls by default to some suitable compromise rate, like once/second or whatever. Kris pgpxMmw1cbEgN.pgp Description: PGP signature
Re: [HACKERS] Anyone interested in improving postgresql scaling?
On Tue, Apr 10, 2007 at 03:52:00PM -0400, Tom Lane wrote: > Kris Kennaway <[EMAIL PROTECTED]> writes: > > On Tue, Apr 10, 2007 at 02:46:56PM -0400, Tom Lane wrote: > >> Oh, I'm sure the BSD kernel acts as you describe. But Mark's point is > >> that Postgres never has more than one process waiting on any particular > >> SysV semaphore, and so the problem doesn't really affect us. > > > To be clear, some behaviour that postgresql does with sysv semaphores > > causes wakeups of many processes at once. i.e. if you have 20 > > clients, you will get up to 20 wakeups. I haven't studied the precise > > cause of this, but it is empirically true. This is the scaling > > problem I described, and it's what mux's patch addresses. > > [ shrug... ] To the extent that that happens, it's Postgres' own issue, > and no amount of kernel rejiggering will change it. But I certainly > have no objection to a patch that fixes the kernel behavior ... As we've discussed before, by far the bigger issue with postgresql performance on FreeBSD is the default setting of update_process_titles=on. Kris pgpxNR2bN01jL.pgp Description: PGP signature
Re: [HACKERS] Idle idea for a feature
Hi, On 4/11/07, Guillaume Smet <[EMAIL PROTECTED]> wrote: On 4/10/07, Tom Lane <[EMAIL PROTECTED]> wrote: > ISTM it'd be a good idea if it did, as "are there any incoming > foreign keys" seems to be a question we constantly ask when solving > update-performance problems, and there isn't any easy way to check for > such. On similar lines, maybe \d can also show the list of inheritors when invoked on a parent. e.g: postgres=# \d parent Table "public.parent" Column | Type | Modifiers +-+--- a | integer | Indexes: "parent_a_key" UNIQUE, btree (a) Inherited by: child Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Ühel kenal päeval, T, 2007-04-10 kell 18:17, kirjutas Joshua D. Drake: > > In terms of idle time for gzip and other command to archive WAL offline, > > no difference in the environment was given other than the command to > > archive. My guess is because the user time is very large in gzip, it > > has more chance for scheduler to give resource to other processes. In > > the case of cp, idle time is more than 30times longer than user time. > > Pg_compresslog uses seven times longer idle time than user time. On the > > other hand, gzip uses less idle time than user time. Considering the > > total amount of user time, I think it's reasonable measure. > > > > Again, in my proposal, it is not the issue to increase run time > > performance. Issue is to decrease the size of archive log to save the > > storage. > > Considering the relatively little amount of storage a transaction log > takes, it would seem to me that the performance angle is more appropriate. As I understand it it's not about transaction log but about write-ahead log. and the amount of data in WAL can become very important once you have to keep standby servers in different physical locations (cities, countries or continents) where channel throughput and cost comes into play. With simple cp (scp/rsync) the amount of WAL data needing to be copied is about 10x more than data collected by trigger based solutions (Slony/pgQ). With pg_compresslog WAL-shipping seems to have roughly the same amount and thus becomes a viable alternative again. > Is it more efficient in other ways besides negligible tps? Possibly more > efficient memory usage? Better restore times for a crashed system? I think that TPS is more affected by number of writes than size of each block written, so there is probably not that much to gain in TPS, except perhaps from better disk cache usage. For me pg_compresslog seems to be a winner even if it just does not degrade performance. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [DOCS] uuid type not documented
# [EMAIL PROTECTED] / 2007-04-10 15:49:08 -0400: > Neil Conway wrote: > >On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote: > > > >>The problem is that most of the standard methods are platform dependent, > >>as they require MAC addresses or a "good" random source, for instance. > >> > > > >http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php > > > >ISTM random() or similar sources is a sufficient PSRNG for the purposes > >of UUID generation -- I can't see anything in the RFC that would > >contradict that. > > > > > > How about we set up a contrib (I wish we'd fixed that) module with an > example function or two? Thought I'd mention Ralph Engelschall's uuid library, comes with a PostgreSQL binding: http://www.ossp.org/pkg/lib/uuid/ http://cvs.ossp.org/fileview?f=ossp-pkg/uuid/pgsql/uuid.txt&v=1.1 I only played with it some time ago. -- How many Vietnam vets does it take to screw in a light bulb? You don't know, man. You don't KNOW. Cause you weren't THERE. http://bash.org/?255991 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]
Kris Kennaway <[EMAIL PROTECTED]> writes: > I think the high number of setproctitle() calls are more problematic > to us at the moment, Kris can comment on that. > Since we've basically had it handed to us that calling setproctitle() > thousands of times per second is something that real applications now > do, we're pretty much forced to work on making it cheaper. > ... > However this won't help all the existing systems out there (including > other affected OSes), so it would be great if you guys could meet us > half way and find a way to make postgresql rate-limit these calls by > default to some suitable compromise rate, like once/second or > whatever. Well, the thing is, we've pretty much had it handed to us that current-command indicators that aren't up to date are not very useful. So rate-limited updates strike me as a useless compromise. We have the "real" solution (status advertised in PG's shared memory) already, so the question in my mind is just how fast DBAs will wish to transition to looking at "select * from pg_stat_activity" instead of looking at "ps auxww". I don't see anything wrong at all with making update_process_title default to "off" in BSD-specific packaging of Postgres. It's a harder sell to turn it off by default everywhere, because of all them Linux users for whom that's just taking away a convenient status viewing method. I think we might get there eventually, but we need a decent interval to wean people away from the old method. [ Disclaimer: I work for Red Hat, so am unlikely to favor doing anything that is a loss on Linux. But I do use and like other platforms too; just don't happen to have any BSD in-house currently, unless you're willing to count Darwin as BSD. ] regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
> In terms of idle time for gzip and other command to archive WAL offline, > no difference in the environment was given other than the command to > archive. My guess is because the user time is very large in gzip, it > has more chance for scheduler to give resource to other processes. In > the case of cp, idle time is more than 30times longer than user time. > Pg_compresslog uses seven times longer idle time than user time. On the > other hand, gzip uses less idle time than user time. Considering the > total amount of user time, I think it's reasonable measure. > > Again, in my proposal, it is not the issue to increase run time > performance. Issue is to decrease the size of archive log to save the > storage. Considering the relatively little amount of storage a transaction log takes, it would seem to me that the performance angle is more appropriate. Is it more efficient in other ways besides negligible tps? Possibly more efficient memory usage? Better restore times for a crashed system? Sincerely, Joshua D. Drake > > Regards; > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Hi, In the case below, we run DBT-2 benchmark for one hour to get the measure. Checkpoint occured three times (checkpoint interval was 20min). For more information, when checkpoint interval is one hour, the amount of the archived log size was as follows: cp: 3.1GB gzip: 1.5GB pg_compresslog: 0.3GB For both cases, database size was 12.7GB, relatively small. As pointed out, if we don't run the checkpoint forever, the value for cp will become close to that for pg_compresslog, but it is not practical. The point here is, if we collect archive log with cp and the average work load is a quarter of the full power, cp archiving will produce about 0.8GB archive log per hour (for DBT-2 case, of course the size depends on the nature of the transaction). If we run the database whole day, the amount of the archive log will be as large as database itself. After one week, archive log size gets seven times as large as the database itself. This is the point. In production, such large archive log will raise storage cost. The purpose of the proposal is not to improve the performance, but to decrease the size of archive log to save necessary storage, preserving the same chance of recovery at the crash recovery as full_page_writes=on. Because of DBT-2 nature, it is not meaningful to compare the throuput (databsae size determines the number of transactions to run). Instead, I compared the throuput using pgbench. These measures are: cp: 570tps, gzip:558tps, pg_compresslog: 574tps, negligible difference. In terms of idle time for gzip and other command to archive WAL offline, no difference in the environment was given other than the command to archive. My guess is because the user time is very large in gzip, it has more chance for scheduler to give resource to other processes. In the case of cp, idle time is more than 30times longer than user time. Pg_compresslog uses seven times longer idle time than user time. On the other hand, gzip uses less idle time than user time. Considering the total amount of user time, I think it's reasonable measure. Again, in my proposal, it is not the issue to increase run time performance. Issue is to decrease the size of archive log to save the storage. Regards; Tom Lane wrote: > Koichi Suzuki <[EMAIL PROTECTED]> writes: >> My proposal is to remove unnecessary full page writes (they are needed >> in crash recovery from inconsistent or partial writes) when we copy WAL >> to archive log and rebuilt them as a dummy when we restore from archive >> log. >> ... >> Benchmark: DBT-2 >> Database size: 120WH (12.3GB) >> Total WAL size: 4.2GB (after 60min. run) >> Elapsed time: >>cp:120.6sec >>gzip: 590.0sec >>pg_compresslog: 79.4sec >> Resultant archive log size: >>cp: 4.2GB >>gzip: 2.2GB >>pg_compresslog: 0.3GB >> Resource consumption: >>cp: user: 0.5sec system: 15.8sec idle: 16.9sec I/O wait: 87.7sec >>gzip: user: 286.2sec system: 8.6sec idle: 260.5sec I/O wait: 36.0sec >>pg_compresslog: >> user: 7.9sec system: 5.5sec idle: 37.8sec I/O wait: 28.4sec > > What checkpoint settings were used to make this comparison? I'm > wondering whether much of the same benefit can't be bought at zero cost > by increasing the checkpoint interval, because that translates directly > to a reduction in the number of full-page images inserted into WAL. > > Also, how much was the database run itself slowed down by the increased > volume of WAL (due to duplicated information)? It seems rather > pointless to me to measure only the archiving effort without any > consideration for the impact on the database server proper. > > regards, tom lane > > PS: there's something fishy about the gzip numbers ... why all the idle > time? > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org > -- Koichi Suzuki ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]
Tom Lane wrote: I think the high number of setproctitle() calls are more problematic to us at the moment, Kris can comment on that. As of PG 8.2 it is possible to turn those off. I don't think there's a lot of enthusiasm for turning them off by default ... at least not yet. But it might make sense to point out in the PG documentation that update_process_title is particularly costly on platforms X, Y, and Z. Do you know if this issue affects all the BSDen equally? Might be good to turn off by default for the 8.2+ Postgresql versions in the FreeBSD ports tree (looks like postgresql.conf.sample is being patched anyway, so pretty easy to amend). Cheers Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]
Maxime Henrion <[EMAIL PROTECTED]> writes: > Thanks for forwarding my mail, Kris! To Tom: if you can get my mails > to reach pgsql-hackers@ somehow that would be just great :-). They'll get approved eventually, just like mine to the BSD lists will get approved eventually ;-) >> The only thing we could do to fix that from our end would be to use >> a smaller sema-set size on *BSD platforms. Is the overhead per sema set >> small enough to make this a sane thing to do? Will we be likely to >> run into system limits on the number of sets? > I'm not familiar enough with the PostgreSQL code to know what impact > such a change could have, but since the problem is clearly on our > side here, I would advise against doing changes in PostgreSQL that > are likely to complicate the code for little gain. We still didn't > even fully measure how much the useless wakups cost us since we're > running into other contention problems with my patch that removes > those. And, as you point out, there are complications ensuing with > respect to system limits (we already ask users to bump them when > they install PostgreSQL). OK, it was just an off-the-cuff idea. > I think the high number of setproctitle() calls are more problematic > to us at the moment, Kris can comment on that. As of PG 8.2 it is possible to turn those off. I don't think there's a lot of enthusiasm for turning them off by default ... at least not yet. But it might make sense to point out in the PG documentation that update_process_title is particularly costly on platforms X, Y, and Z. Do you know if this issue affects all the BSDen equally? regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [EMAIL PROTECTED]: Re: Anyone interested in improving postgresql scaling?]
Kris Kennaway <[EMAIL PROTECTED]> forwards: > Yes but there are still a lot of wakeups to be avoided in the current > System V semaphore code. More specifically, not only do we wakeup all > the processes waiting on a single semaphore everytime something changes, > but we also wakeup all processes waiting on *any* of the semaphore in > the semaphore *set*, whatever the reason we're sleeping. O ... *that's* the problem. Ugh. Although we have a separate semaphore for each PG backend, they're grouped into semaphore sets (I think 16 active semaphores per set). So a wakeup intended for one process would uselessly send up to 15 others through the semop code. The only thing we could do to fix that from our end would be to use a smaller sema-set size on *BSD platforms. Is the overhead per sema set small enough to make this a sane thing to do? Will we be likely to run into system limits on the number of sets? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Anyone interested in improving postgresql scaling?
Kris Kennaway <[EMAIL PROTECTED]> writes: > On Tue, Apr 10, 2007 at 05:36:17PM -0400, Tom Lane wrote: >> Anyway I'd be interested to know what the test case is, and which PG >> version you were testing. > I used 8.2 (and some older version when I first noticed it a year ago) > and either sysbench or supersmack will show it - presumably anything > that makes simultaneous queries. Just instrument sleepq_broadcast() > to e.g. log a KTR event when it wakes more than 1 process and you'll > see it happening. Sorry, I'm not much of a BSD kernel hacker ... but sleepq_broadcast seems a rather generic name. Is that called *only* from semop? I'm wondering if you are seeing simultaneous wakeup from some other cause --- sleep timeout being the obvious possibility. We are aware of behaviors (search the PG lists for "context swap storm") where a number of backends will all fail to get a spinlock and do short usleep or select-timeout waits. In this situation they'd all wake up at the next scheduler clock tick ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Anyone interested in improving postgresql scaling?
On 2007-04-10, Tom Lane <[EMAIL PROTECTED]> wrote: > Kris Kennaway <[EMAIL PROTECTED]> writes: >> I have not studied the exact code path, but there are indeed multiple >> wakeups happening from the semaphore code (as many as the number of >> active postgresql processes). It is easy to instrument >> sleepq_broadcast() and log them when they happen. > > There are certainly cases where Postgres will wake up a number of > processes in quick succession, but that should happen from a separate > semop() kernel call, on a different semaphore, for each such process. > If there's really multiple processes being released by the same semop() > then there's a bug we need to look into (or maybe it's a kernel bug?). > Anyway I'd be interested to know what the test case is, and which PG > version you were testing. This is a problem in FreeBSD, not specifically to do with postgres - the granularity for SysV semaphore wakeups in FreeBSD-6.x and earlier is the entire semaphore set, not just one specific semaphore within the set. I explained that to Kris some weeks ago, and someone (mux) did a patch (to FreeBSD, not pg) which was already mentioned in this discussion. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Anyone interested in improving postgresql scaling?
Kris Kennaway <[EMAIL PROTECTED]> writes: > I have not studied the exact code path, but there are indeed multiple > wakeups happening from the semaphore code (as many as the number of > active postgresql processes). It is easy to instrument > sleepq_broadcast() and log them when they happen. There are certainly cases where Postgres will wake up a number of processes in quick succession, but that should happen from a separate semop() kernel call, on a different semaphore, for each such process. If there's really multiple processes being released by the same semop() then there's a bug we need to look into (or maybe it's a kernel bug?). Anyway I'd be interested to know what the test case is, and which PG version you were testing. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [DOCS] uuid type not documented
Neil Conway <[EMAIL PROTECTED]> writes: > On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote: >> The problem is that most of the standard methods are platform dependent, as >> they require MAC addresses or a "good" random source, for instance. > http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php > ISTM random() or similar sources is a sufficient PSRNG for the purposes > of UUID generation -- I can't see anything in the RFC that would > contradict that. Doesn't seem like quite enough bits of uniqueness. We could improve matters by incorporating the database's pg_control.system_identifier into the UUID, substituting for the MAC address we don't have a good way to get. system_identifier is currently determined by the system clock at initdb time (to gettimeofday precision), so it would add at least some additional bits of randomness... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Idle idea for a feature
On 4/10/07, Tom Lane <[EMAIL PROTECTED]> wrote: ISTM it'd be a good idea if it did, as "are there any incoming foreign keys" seems to be a question we constantly ask when solving update-performance problems, and there isn't any easy way to check for such. Sure. We wrote a stored proc to do that and it could be really nice to have it by defaut in the table definition. Something like: Incoming foreign keys: "name_of_the_fkey" (column of this table) REFERENCED BY table(columns of the other table) could be nice. Not sure about the title of the section. Another problem is that this syntax is not based on valid SQL syntax. -- Guillaume ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [DOCS] uuid type not documented
Neil Conway wrote: On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote: The problem is that most of the standard methods are platform dependent, as they require MAC addresses or a "good" random source, for instance. http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php ISTM random() or similar sources is a sufficient PSRNG for the purposes of UUID generation -- I can't see anything in the RFC that would contradict that. Maybe a short-term solution could be a UUID-generated function that takes some kind of seed as a parameter. People not concerned about collisons could just pass some random value, while others could use the mac-address of the client or something similar. greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Anyone interested in improving postgresql scaling?
Kris Kennaway <[EMAIL PROTECTED]> writes: > On Tue, Apr 10, 2007 at 02:46:56PM -0400, Tom Lane wrote: >> Oh, I'm sure the BSD kernel acts as you describe. But Mark's point is >> that Postgres never has more than one process waiting on any particular >> SysV semaphore, and so the problem doesn't really affect us. > To be clear, some behaviour that postgresql does with sysv semaphores > causes wakeups of many processes at once. i.e. if you have 20 > clients, you will get up to 20 wakeups. I haven't studied the precise > cause of this, but it is empirically true. This is the scaling > problem I described, and it's what mux's patch addresses. [ shrug... ] To the extent that that happens, it's Postgres' own issue, and no amount of kernel rejiggering will change it. But I certainly have no objection to a patch that fixes the kernel behavior ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] "select ('{}'::text[])[1]" returns NULL -- is it correct?
Nikolay Samokhvalov wrote: On 4/10/07, Tom Lane <[EMAIL PROTECTED]> wrote: "Nikolay Samokhvalov" <[EMAIL PROTECTED]> writes: > I remember several cases when people (e.g. me :-) ) were spending some > time trying to find an error in some pl/pgsql function and the reason > lied in incorrect work with arrays (i.e. messages like "index is out > of bounds" and "index cannot be negative number" would help, surely). Well, if indexes *couldn't* be negative numbers then that might be helpful, but they can. Ooops :-) OK, my proposal is narrowing to very simple one: what about triggering WARNINGs when user tries to access nonexistent element of array? Please don't ;-) There are two sane options - return an error, or return NULL. Both are sensible, and different programming languages make different choices. The only reason for a WARNING would be a long-term plan to change the existing behaviour. But this will cause lots of pain, for no real gain, because no matter which behaviour you pick, there are always situations where the other would be more convenient. Just look at the mess PHP has created by altering fundamental aspects of the language (4.4 -> 5.0). greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [DOCS] uuid type not documented
Neil Conway wrote: On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote: The problem is that most of the standard methods are platform dependent, as they require MAC addresses or a "good" random source, for instance. http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php ISTM random() or similar sources is a sufficient PSRNG for the purposes of UUID generation -- I can't see anything in the RFC that would contradict that. How about we set up a contrib (I wish we'd fixed that) module with an example function or two? cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [DOCS] uuid type not documented
On Tue, 2007-04-10 at 18:28 +0200, Peter Eisentraut wrote: > The problem is that most of the standard methods are platform dependent, as > they require MAC addresses or a "good" random source, for instance. http://archives.postgresql.org/pgsql-patches/2007-01/msg00392.php ISTM random() or similar sources is a sufficient PSRNG for the purposes of UUID generation -- I can't see anything in the RFC that would contradict that. -Neil ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] "select ('{}'::text[])[1]" returns NULL -- is it correct?
On 4/10/07, Tom Lane <[EMAIL PROTECTED]> wrote: "Nikolay Samokhvalov" <[EMAIL PROTECTED]> writes: > I remember several cases when people (e.g. me :-) ) were spending some > time trying to find an error in some pl/pgsql function and the reason > lied in incorrect work with arrays (i.e. messages like "index is out > of bounds" and "index cannot be negative number" would help, surely). Well, if indexes *couldn't* be negative numbers then that might be helpful, but they can. Ooops :-) OK, my proposal is narrowing to very simple one: what about triggering WARNINGs when user tries to access nonexistent element of array? -- Best regards, Nikolay ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] "select ('{}'::text[])[1]" returns NULL -- is it correct?
On 4/9/07, Tom Lane <[EMAIL PROTECTED]> wrote: "Nikolay Samokhvalov" <[EMAIL PROTECTED]> writes: > As I can see here, when I ask for element that doesn't exist, the > database returns NULL for me. Maybe it's well-known issue (and > actually I understood this behaviour before), but strictly speaking it > seems wrong for me: the database _knows_ that there is no element, so > why NULL? [...] AFAIR it's always been like that, so changing it seems exceedingly likely to break some peoples' applications. It's not completely without analogies in SQL, anyway: consider the behavior of INSERT when fewer columns are provided than the table has. Pretending that elements outside the stored range of the array are null is not all that different from silently adding nulls to a row-to-be-stored. OK, I see. But if I try to INSERT to column that doesn't exist in the table, I have an error. Why pg's arrays are designed so that postgres doesn't produce errors for attempts to access nonexistent element of array? Why there is no simple sanity check (SELECT (ARRAY[6,8])[-1] -- works w/o an error)? I remember several cases when people (e.g. me :-) ) were spending some time trying to find an error in some pl/pgsql function and the reason lied in incorrect work with arrays (i.e. messages like "index is out of bounds" and "index cannot be negative number" would help, surely). -- Best regards, Nikolay ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] prepared statements logging
Hi * I am using the postgresql-8.2.3, with a jdbc-8.2-504 (the GeoNet webServer tool...) My question is : Is the <> server able to log the whole <> query? {made by a prepared statement} Looking at the log I can see only queries of the type: (cut). LOG: execute : SELECT * FROM Metadata WHERE id=$1 DETAIL: parameters: $1 = '28' DEBUG: parse : SELECT schemaId, createDate, lastChangeDate, source, isTemplate, uuid FROM Metadata WHERE id = 28 (cut). The strange is that my query has to search on a "Test vector data" and in the whole log-file this word <> is not present! My desire is to have a line where I can see: SELECTbla..bla...bla FROM MetadataWHERE id = "...some pattern" [maybe islike some pattern] Thx in advance. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Anyone interested in improving postgresql scaling?
On Tue, Apr 10, 2007 at 10:23:42AM -0400, Tom Lane wrote: > Mark Kirkwood <[EMAIL PROTECTED]> writes: > > Kris Kennaway wrote: > >> If so, then your task is the following: > >> > >> Make SYSV semaphores less dumb about process wakeups. Currently > >> whenever the semaphore state changes, all processes sleeping on the > >> semaphore are woken, even if we only have released enough resources > >> for one waiting process to claim. i.e. there is a thundering herd > >> wakeup situation which destroys performance at high loads. Fixing > >> this will involve replacing the wakeup() calls with appropriate > >> amounts of wakeup_one(). > > > I'm forwarding this to the pgsql-hackers list so that folks more > > qualified than I can comment, but as I understand the way postgres > > implements locking each process has it *own* semaphore it waits on - > > and who is waiting for what is controlled by an in (shared) memory hash > > of lock structs (access to these is controlled via platform Dependant > > spinlock code). So a given semaphore state change should only involve > > one process wakeup. > > Correct. The behavior Kris describes is surely bad, but it's not > relevant to Postgres' usage of SysV semaphores. Sorry, but the behaviour is real. Kris pgphJTqz6La4j.pgp Description: PGP signature
Re: [HACKERS] Anyone interested in improving postgresql scaling?
On Tue, Apr 10, 2007 at 10:41:04PM +1200, Mark Kirkwood wrote: > Kris Kennaway wrote: > >If so, then your task is the following: > > > >Make SYSV semaphores less dumb about process wakeups. Currently > >whenever the semaphore state changes, all processes sleeping on the > >semaphore are woken, even if we only have released enough resources > >for one waiting process to claim. i.e. there is a thundering herd > >wakeup situation which destroys performance at high loads. Fixing > >this will involve replacing the wakeup() calls with appropriate > >amounts of wakeup_one(). > > I'm forwarding this to the pgsql-hackers list so that folks more > qualified than I can comment, but as I understand the way postgres > implements locking each process has it *own* semaphore it waits on - > and who is waiting for what is controlled by an in (shared) memory hash > of lock structs (access to these is controlled via platform Dependant > spinlock code). So a given semaphore state change should only involve > one process wakeup. I have not studied the exact code path, but there are indeed multiple wakeups happening from the semaphore code (as many as the number of active postgresql processes). It is easy to instrument sleepq_broadcast() and log them when they happen. Anyway mux@ fixed this some time ago, which indeed helped scaling for traffic over a local domain socket (particularly at higher loads), but I saw some anomalous results when using loopback TCP traffic. I think this is unrelated (in this situation TCP is highly contended, and it is often the case that fixing one bottleneck can make a highly contended situation perform worse, because you were effectively serializing a bit before, and reducing the non-linear behaviour) but am still investigating, so the patch has not yet been committed. Kris pgpDDvfRzeiGJ.pgp Description: PGP signature
Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > Here is another patch to fix never-ending loop in mdsync. I introduced > a mdsync counter (cycle id) and cancel flags to fix the problem. > The mdsync counter is incremented at the every beginning of mdsync(). > Each pending entry has a field assigned from the counter when it is > newly inserted to pendingOpsTable. Only entries that have smaller counter > values than the mdsync counter are fsync-ed in mdsync(). > Another change is to add a cancel flag in each pending entry. When a > relation is dropped and bgwriter receives a forget-request, the corresponding > entry is marked as dropped but we don't delete it at that time. Actual > deletion is performed in the next fsync loop. We don't have to retry after > AbsorbFsyncRequests() because entries are not removed outside of seqscan. This patch looks fairly sane to me; I have a few small gripes about coding style but that can be fixed while applying. Heikki, you were concerned about the cycle-ID idea; do you have any objection to this patch? > This patch can be applied to HEAD, 8.2 and 8.1 with a few hunks. I don't think we should back-patch something that's a performance fix for an extreme case, especially not when it's not been through any extensive testing yet ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem
(Sorry if you receive duplicate messages. I resend it since it was not delivered after a day.) Here is another patch to fix never-ending loop in mdsync. I introduced a mdsync counter (cycle id) and cancel flags to fix the problem. The mdsync counter is incremented at the every beginning of mdsync(). Each pending entry has a field assigned from the counter when it is newly inserted to pendingOpsTable. Only entries that have smaller counter values than the mdsync counter are fsync-ed in mdsync(). Another change is to add a cancel flag in each pending entry. When a relation is dropped and bgwriter receives a forget-request, the corresponding entry is marked as dropped but we don't delete it at that time. Actual deletion is performed in the next fsync loop. We don't have to retry after AbsorbFsyncRequests() because entries are not removed outside of seqscan. This patch can be applied to HEAD, 8.2 and 8.1 with a few hunks. Tom Lane <[EMAIL PROTECTED]> wrote: > > In my understanding from the discussion, we'd better to take "cycle ID" > > approach instead of "making a copy of pendingOpsTable", because duplicated > > table is hard to debug and requires us to pay attention not to leak > > memories. > > I'll adopt the cycle ID approach and build LDC on it as a separate patch. > > Heikki made some reasonable arguments against the cycle-ID idea. I'm > not intending to insist on it ... Regards, --- ITAGAKI Takahiro NTT Open Source Software Center fix_mdsync.patch Description: Binary data ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] [Fwd: Index Advisor]
Hi Tom, The original patch was submitted by Kai Sattler, and we (at EDB) spent a lot of time improving it, making it as seamless and as user-friendly as possible. As is evident from the version number of the patch (v26), it has gone through a lot of iterations, and was available to the community for review and discussion (and discuss they did; they asked for a few things and those were added/improved). I am thinking the API needs to be simpified, perhaps by removing the system table and having the recommendations just logged to the server logs. This means that this very useful information (in log files) would need to be passed through an intermediary or another tool developed to allow access to this information. I think that having this available from a table would be very nice. In the initial submission, the feature was a big piece of code embedded inside the backend. It required a system table, did not show the new plan, actually created index physically before re planning, and could not advise for a running application (everything had to be manually EXPLAINed). I read through the thread titled "Index Tuning Features" that first discussed the idea of an Index adviser for PG, and this patch also meets quite a few requirements raised there. Here are a few of the good things about this patch as of now: .) Loadable plugin. Develop your own plugin to do nifty things with the plan generated by the planner. Just as the debugger is implemented; if no plugin... no work to do... .) No syntax change. Run your queries as they are and get the advice in the advise_index table (or set client_min_messages = LOG, to see the improved plan on the screen also, if any). .) Can recommend indexes even for the generated dynamic-queries, that are hard to regenerate in a dry-run. .) Can recommend indexes for SQL being executed through plpgsql (or any PL) (again, hard to regenerate the parameterized queries by hand), and the the advice is available in the advise_index table. .) The adviser dumps it's advice in a table named advise_index. That can be a user table, or a view with INSERT rule, or anything else; it should just be an INSERTable object, accessible to the executing user (as opposed to a system table required by the original implementation, and hence a need for initdb). .) No need to modify the application in any way; just set PGOPTIONS environment variable properly before executing the appln., and run it as usual... you have the advice generated for you. .) No need for DBA (or the appln. writer) to feed anything to the planner in any way; the process of recommendation is fully automated (this may change if another plugin implimentation requires the stats in some user table). .) Does recommend multi-column indexes. Does not make a set of each fathomable combination of table columns to develop multi-column indexes (hence avoiding a combinatorial explosion of time-space requirements); it uses the columns used in the query to generate multi-column indexes. .) The indexes are not created on disk; the index-tuple-size calculation function does a very good job of estimating the size of the virtual index. .) The changes to the catalog are just for the backend running under the adviser, no one else can see those virtual indexes (as opposed to the earlier implementation where the indexes were created on-disk, and available to all the backends in the planning phase). So, with one hook (no GUC variables!), we get all these cool things. I tried very hard to eliminate that one leftover kludge, but couldn't (we have two options here, and they are enclosed in '#if GLOBAL_CAND_LIST ... #else' parts of the code; left upto the committers to decide which one we need!). Another kludge that I had to add was the SPI_connect() and SPI_finish() frame around the savepoint handling, since the RollbackToSavepoint in xact.cassumes that only a PL/* module must be using the savepoint infrastucture (this was discussed on -hackers). The interface etc. may not be beautiful, but it isn't ugly either! It is a lot better than manually creating pg_index records and inserting them into cache; we use index_create() API to create the index (build is deferred), and then 'rollback to savepoint' to undo those changes when the advisor is done. index_create() causes pg_depends entries too, so a 'RB to SP' is far much safer than going and deleting cache records manually. I hope you would agree that we need two passes of planner, one without v-indexes and the other with v-indexes, for the backend to compare the costs, and recommend indexes only if the second plan turned out to be cheaper. If we implement the way you have suggested, then we will need one hook at the end of get_relation_info(), one in EXPLAIN code, and yet another, someplace after planner is finished, to do the comparison of the two plans and recommend only those indexes that were considered to be useful by the planner. (A total of three places to touch).
Re: [HACKERS] Anyone interested in improving postgresql scaling?
Kris Kennaway <[EMAIL PROTECTED]> writes: >>> Make SYSV semaphores less dumb about process wakeups. Currently >>> whenever the semaphore state changes, all processes sleeping on the >>> semaphore are woken, even if we only have released enough resources >>> for one waiting process to claim. >> Correct. The behavior Kris describes is surely bad, but it's not >> relevant to Postgres' usage of SysV semaphores. > Sorry, but the behaviour is real. Oh, I'm sure the BSD kernel acts as you describe. But Mark's point is that Postgres never has more than one process waiting on any particular SysV semaphore, and so the problem doesn't really affect us. Or do you mean that the kernel wakes all processes sleeping on *any* SysV semaphore? That would be nasty :-( regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Fix mdsync never-ending loop problem
I wrote: > This patch looks fairly sane to me; I have a few small gripes about > coding style but that can be fixed while applying. Heikki, you were > concerned about the cycle-ID idea; do you have any objection to this > patch? Actually, on second look I think the key idea here is Takahiro-san's introduction of a cancellation flag in the hashtable entries, to replace the cases where AbsorbFsyncRequests can try to delete entries. What that means is mdsync() doesn't need an outer retry loop at all: the periodic AbsorbFsyncRequests calls are not a hazard, and retry of FileSync failures can be handled as an inner loop on the single failing table entry. (We can make the failure counter a local variable, too, instead of needing space in every hashtable entry.) And with that change, it's no longer possible for an incoming stream of fsync requests to keep mdsync from terminating. It might fsync more than it really needs to, but it won't repeat itself, and it must reach the end of the hashtable eventually. So we don't actually need the cycle counter at all. It might be worth having the cycle counter anyway just to avoid doing "useless" fsync work. I'm not sure about this. If we have a cycle counter of say 32 bits, then it's theoretically possible for an fsync to fail 2^32 consecutive times and then be skipped on the next try, allowing a checkpoint to succeed that should not have. We can fix that with a few more lines of logic to detect a wrapped-around value, but is it worth the trouble? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Koichi Suzuki <[EMAIL PROTECTED]> writes: > My proposal is to remove unnecessary full page writes (they are needed > in crash recovery from inconsistent or partial writes) when we copy WAL > to archive log and rebuilt them as a dummy when we restore from archive > log. > ... > Benchmark: DBT-2 > Database size: 120WH (12.3GB) > Total WAL size: 4.2GB (after 60min. run) > Elapsed time: >cp:120.6sec >gzip: 590.0sec >pg_compresslog: 79.4sec > Resultant archive log size: >cp: 4.2GB >gzip: 2.2GB >pg_compresslog: 0.3GB > Resource consumption: >cp: user: 0.5sec system: 15.8sec idle: 16.9sec I/O wait: 87.7sec >gzip: user: 286.2sec system: 8.6sec idle: 260.5sec I/O wait: 36.0sec >pg_compresslog: > user: 7.9sec system: 5.5sec idle: 37.8sec I/O wait: 28.4sec What checkpoint settings were used to make this comparison? I'm wondering whether much of the same benefit can't be bought at zero cost by increasing the checkpoint interval, because that translates directly to a reduction in the number of full-page images inserted into WAL. Also, how much was the database run itself slowed down by the increased volume of WAL (due to duplicated information)? It seems rather pointless to me to measure only the archiving effort without any consideration for the impact on the database server proper. regards, tom lane PS: there's something fishy about the gzip numbers ... why all the idle time? ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [DOCS] uuid type not documented
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > Peter Eisentraut wrote: >> The problem is that most of the standard methods are platform dependent, as >> they require MAC addresses or a "good" random source, for instance. I'm not >> sure how we wanted to solve that, but certainly leaving the uuid type with >> *no* method to generate one is pretty poor. > Actually, I would say that not having the ability to generate a UUID is > just fine. Most languages, have the ability to generate them per their > particular platforms. Let's leave it to them. Let us *not* open that can of worms for 8.3. This patch would not have got in except that it didn't attempt to solve that problem, and there's even less time available now. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [DOCS] uuid type not documented
Peter Eisentraut wrote: Am Dienstag, 10. April 2007 17:30 schrieb Neil Conway: On Tue, 2007-04-10 at 17:24 +0200, Peter Eisentraut wrote: The new uuid type is lacking documentation. We had also talked about including some UUID generation functionality in 8.3, but it should be okay to leave that for 8.4. The problem is that most of the standard methods are platform dependent, as they require MAC addresses or a "good" random source, for instance. I'm not sure how we wanted to solve that, but certainly leaving the uuid type with *no* method to generate one is pretty poor. Actually, I would say that not having the ability to generate a UUID is just fine. Most languages, have the ability to generate them per their particular platforms. Let's leave it to them. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [DOCS] uuid type not documented
Am Dienstag, 10. April 2007 17:30 schrieb Neil Conway: > On Tue, 2007-04-10 at 17:24 +0200, Peter Eisentraut wrote: > > The new uuid type is lacking documentation. > > We had also talked about including some UUID generation functionality in > 8.3, but it should be okay to leave that for 8.4. The problem is that most of the standard methods are platform dependent, as they require MAC addresses or a "good" random source, for instance. I'm not sure how we wanted to solve that, but certainly leaving the uuid type with *no* method to generate one is pretty poor. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Group Commit
Heikki Linnakangas <[EMAIL PROTECTED]> writes: > I've refrained from spending time on group commit until the > commit-no-wait patch lands, because it's going to conflict anyway. I'm > starting to feel we should not try to rush group commit into 8.3, unless > it somehow falls out of the commit-no-wait patch by accident, given that > we're past feature freeze and coming up with a proper group commit > algorithm would need a lot of research and testing. Better do it for 8.4 > with more time, we've got enough features on plate for 8.3 anyway. It's possible that it *would* fall out of commit-no-wait, if we are alert to the possibility of shaking the tree in the right direction ;-) Otherwise I agree with waiting till 8.4 to deal with it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] \da doesn't show result type
Am Dienstag, 10. April 2007 17:12 schrieb Magnus Hagander: > > > Is there a reason that \da doesn't show the functions result type, like > > > \df does? I would find that information useful. > Eh, but it does? As of > http://archives.postgresql.org/pgsql-committers/2007-03/msg00138.php. Ah, OK, one feature in 8.3 that I can use. :) -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] \da doesn't show result type
On Tue, Apr 10, 2007 at 10:32:38AM -0400, Tom Lane wrote: > Peter Eisentraut <[EMAIL PROTECTED]> writes: > > Is there a reason that \da doesn't show the functions result type, like \df > > does? I would find that information useful. > > +1, I've been annoyed by that too. Eh, but it does? As of http://archives.postgresql.org/pgsql-committers/2007-03/msg00138.php. Or am I missing what you're asking for here? :-) //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Idle idea for a feature
* Tom Lane ([EMAIL PROTECTED]) wrote: > one). ISTM it'd be a good idea if it did, as "are there any incoming > foreign keys" seems to be a question we constantly ask when solving > update-performance problems, and there isn't any easy way to check for > such. I'm not real sure what the printout should look like, though. Agreed. Suggestion: networx=> \d wdm_networx.loc_base_clin Table "wdm_networx.loc_base_clin" Column| Type | Modifiers -+-+--- btable_id | integer | not null loc_base_id | integer | not null clin| integer | not null Indexes: "loc_base_clin_pkey" PRIMARY KEY, btree (btable_id, loc_base_id, clin) "btable_id_clin_unique" UNIQUE, btree (btable_id, clin) Foreign-key constraints: "loc_base_clin_btable_id_fkey" FOREIGN KEY (btable_id) REFERENCES wdm_networx.btables_ref(btable_id) "loc_base_clin_clin_fkey" FOREIGN KEY (clin) REFERENCES wdm_networx.clin(clin) "loc_base_clin_loc_base_id_fkey" FOREIGN KEY (loc_base_id) REFERENCES wdm_networx.loc_base_dscr(loc_base_id) networx=> \d wdm_networx.loc_base_dscr Table "wdm_networx.loc_base_dscr" Column | Type | Modifiers ++--- loc_base_id| integer| not null description| character varying(254) | not null locations | character varying(254) | univ_mandatory | character varying(254) | ent_mandatory | character varying(254) | Indexes: "loc_base_dscr_pkey" PRIMARY KEY, btree (loc_base_id) Referenced by: "loc_base_clin_loc_base_id_fkey" FOREIGN KEY (loc_base_id) BY wdm_networx.loc_base_clin(loc_base_id) /|\/|\ Referenced column(s) in *this* table <| |---> column(s) in referencing table Just my 2c. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] \da doesn't show result type
Peter Eisentraut <[EMAIL PROTECTED]> writes: > Is there a reason that \da doesn't show the functions result type, like \df > does? I would find that information useful. +1, I've been annoyed by that too. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] TOASTing smaller things
Hi Bruce, How about these: - Allow specification of TOAST size threshold in bytes on a per column basis - Enable storage of columns in separate TOAST tables - Enable use of multi-row compression method(s) for TOAST tables - Luke On 3/26/07 5:39 PM, "Bruce Momjian" <[EMAIL PROTECTED]> wrote: > Luke Lonergan wrote: >> I advocate the following: >> >> - Enable specification of TOAST policy on a per column basis >> >> As a first step, then: >> >> - Enable vertical partitioning of tables using per-column specification of >> storage policy. >> > > How are these different from ALTER TABLE SET STORAGE? They need to be > more specific. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Anyone interested in improving postgresql scaling?
Mark Kirkwood <[EMAIL PROTECTED]> writes: > Kris Kennaway wrote: >> If so, then your task is the following: >> >> Make SYSV semaphores less dumb about process wakeups. Currently >> whenever the semaphore state changes, all processes sleeping on the >> semaphore are woken, even if we only have released enough resources >> for one waiting process to claim. i.e. there is a thundering herd >> wakeup situation which destroys performance at high loads. Fixing >> this will involve replacing the wakeup() calls with appropriate >> amounts of wakeup_one(). > I'm forwarding this to the pgsql-hackers list so that folks more > qualified than I can comment, but as I understand the way postgres > implements locking each process has it *own* semaphore it waits on - > and who is waiting for what is controlled by an in (shared) memory hash > of lock structs (access to these is controlled via platform Dependant > spinlock code). So a given semaphore state change should only involve > one process wakeup. Correct. The behavior Kris describes is surely bad, but it's not relevant to Postgres' usage of SysV semaphores. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Idle idea for a feature
psql's \d command tells you about outgoing foreign key constraints (ie, ones referencing another table from this one). It doesn't tell you about incoming ones (ie, ones where another table references this one). ISTM it'd be a good idea if it did, as "are there any incoming foreign keys" seems to be a question we constantly ask when solving update-performance problems, and there isn't any easy way to check for such. I'm not real sure what the printout should look like, though. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] "select ('{}'::text[])[1]" returns NULL -- is it correct?
"Nikolay Samokhvalov" <[EMAIL PROTECTED]> writes: > I remember several cases when people (e.g. me :-) ) were spending some > time trying to find an error in some pl/pgsql function and the reason > lied in incorrect work with arrays (i.e. messages like "index is out > of bounds" and "index cannot be negative number" would help, surely). Well, if indexes *couldn't* be negative numbers then that might be helpful, but they can. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] \da doesn't show result type
Is there a reason that \da doesn't show the functions result type, like \df does? I would find that information useful. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Anyone interested in improving postgresql scaling?
Kris Kennaway wrote: If so, then your task is the following: Make SYSV semaphores less dumb about process wakeups. Currently whenever the semaphore state changes, all processes sleeping on the semaphore are woken, even if we only have released enough resources for one waiting process to claim. i.e. there is a thundering herd wakeup situation which destroys performance at high loads. Fixing this will involve replacing the wakeup() calls with appropriate amounts of wakeup_one(). I'm forwarding this to the pgsql-hackers list so that folks more qualified than I can comment, but as I understand the way postgres implements locking each process has it *own* semaphore it waits on - and who is waiting for what is controlled by an in (shared) memory hash of lock structs (access to these is controlled via platform Dependant spinlock code). So a given semaphore state change should only involve one process wakeup. Cheers Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Group Commit
Tom Lane wrote: Heikki Linnakangas <[EMAIL PROTECTED]> writes: I've been working on the patch to enhance our group commit behavior. The patch is a dirty hack at the moment, but I'm settled on the algorithm I'm going to use and I know the issues involved. One question that just came to mind is whether Simon's no-commit-wait patch doesn't fundamentally alter the context of discussion for this. Aside from the prospect that people won't really care about group commit if they can just use the periodic-WAL-sync approach, ISTM that one way to get group commit is to just make everybody wait for the dedicated WAL writer to write their commit record. With a sufficiently short delay between write/fsync attempts in the background process, won't that net out at about the same place as a complicated group-commit patch? Possibly. To get efficient group commit there would need to be some kind of signaling between the WAL writer and normal backends. I think there is some in the patch, but I'm not sure if it gives efficient group commit. A constant delay will just give us something similar to commit_delay. I've refrained from spending time on group commit until the commit-no-wait patch lands, because it's going to conflict anyway. I'm starting to feel we should not try to rush group commit into 8.3, unless it somehow falls out of the commit-no-wait patch by accident, given that we're past feature freeze and coming up with a proper group commit algorithm would need a lot of research and testing. Better do it for 8.4 with more time, we've got enough features on plate for 8.3 anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Group Commit
> > > I've been working on the patch to enhance our group commit behavior. > > > The patch is a dirty hack at the moment, but I'm settled on the > > > algorithm I'm going to use and I know the issues involved. > > > > One question that just came to mind is whether Simon's no-commit-wait > > patch doesn't fundamentally alter the context of discussion for this. > > Aside from the prospect that people won't really care about group > > commit if they can just use the periodic-WAL-sync approach, ISTM that > > one way to get group commit is to just make everybody wait for the > > dedicated WAL writer to write their commit record. Yes good catch, I think we will want to merge the two. But, you won't want to wait indefinitely, since imho the dedicated WAL writer will primarily only want to write/flush full WAL pages. Maybe flush half full WAL pages only after some longer timeout. But basically this timeout should be longer than an individual backend is willing to delay their commit. > > With a > > sufficiently short delay between write/fsync attempts in the > > background process, won't that net out at about the same place as a > > complicated group-commit patch? I don't think we want the delay so short, or we won't get any grouped writes. I think what we could do is wait up to commit_delay for the dedicated WAL writer to do it's work. If it did'nt do it until timeout let the backend do the flushing itself. > I think the big question is whether commit_delay is ever going to be generally useful. It is designed to allow a higher transaction/second rate on a constantly WAL bottlenecked system, so I think it still has a use case. I think you should not compare it to no-commit-wait from the feature side (only implementation). Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq