Re: [HACKERS] PostgreSQL7.1 on AIX5L is running with too poorferformance
Leslie [EMAIL PROTECTED] writes: PostgreSQL7.1 is now running on AIX5L( S85, 6GB memory, 6CPU), which was running on Linux before(Pentium3, 2CPU, as far as I remember...sorry..). The performance( on AIX5L ) is just half as good as the one( on Linux ). Hmm ... is the AIX compilation selecting an appropriate TAS implementation for spinlocks? I think yes. I have compiled 7.1 on an AIX5L box and found that TAS() was replaced by: cs((int *) (lock), 0, 1) If it's falling back to semaphore-based spinlocks, I can easily believe that you might take a 2X performance hit. Possibly s_lock.h needs some additional #if tests for AIX5. -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] SIGCHLD handler in Postgres C function.
On Sun, 22 Jul 2001, Tatsuo Ishii wrote: [EMAIL PROTECTED] writes: I have written a postgres C function that uses a popen linux system call. Orginally when I first tried it I kept getting an ECHILD. I read a little bit more on the pclose function and the wait system calls and discoverd that on LINUX if the signal handler for SIGCHLD is set to SIG_IGN you will get the ECHILD error on pclose(or wait4 for that matter). So I did some snooping around in the postgres backend code and found that in the traffic cop that the SIGCHLD signal handler is set to SIG_IGN. So in my C function right before the popen call I set the signal handler for SIGCHLD to SIG_DFL and right after the pclose I set it back to SIG_IGN. I tested this and it seems to solve my problem. Just ignore ECHILD. It's not messy at all. :-) It sounds like your kernel is using SIG_IGN to do the same thing as the SA_NOCLDWAIT flag in *BSD (well NetBSD at least). When a child dies, it gets re-parrented to init (which is wait()ing). init does the child-died cleanup, rather than the parent needing to. That way when the parent runs wait(), there is no child, so you get an ECHILD. All ECHILD is doing is saying there was no child. Since we aren't really waiting for the child, I don't see how that's a problem. Take care, Bill ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] SIGCHLD handler in Postgres C function.
Bill Studenmund [EMAIL PROTECTED] writes: All ECHILD is doing is saying there was no child. Since we aren't really waiting for the child, I don't see how that's a problem. You're missing the point: on some platforms the system() call is returning a failure indication because of ECHILD. It's system() that's broken, not us, and the issue is how to work around its brokenness without sacrificing more error detection than we have to. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
RE: [HACKERS] Performance TODO items
3) I am reading the Solaris Internals book and there is mention of a free behind capability with large sequential scans. When a large sequential scan happens that would wipe out all the old cache entries, the kernel detects this and places its previous pages first on the free list. For out code, if we do a sequential scan of a table that is larger than our buffer cache size, I think we should detect this and do the same. See http://techdocs.postgresql.org for my performance paper for an example. New TODO entries are: * Order duplicate index entries by tid * Add queue of backends waiting for spinlock * Add free-behind capability for large sequential scans So why do we cache sequetially-read pages? Or at least not have an option to control it? Oracle (to the best of my knowledge) does NOT cache pages read by a sequential index scan for at least two reasons/assumptions (two being all that I can recall): 1. Caching pages for sequential scans over sufficiently large tables will just cycle the cache. The pages that will be cached at the end of the query will be the last N pages of the table, so when the same sequential query is run again, the scan from the beginning of the table will start flushing the oldest cached pages which are more than likely going to be the ones that will be needed at the end of the scan, etc, etc. In a multi-user environment, the effect is worse. 2. Concurrent or consective queries in a dynamic database will not generate plans that use the same sequential scans, so they will tend to thrash the cache. Now there are some databases where the same general queries are run time after time and caching the pages from sequential scans does make sense, but in larger, enterprise-type systems, indices are created to help speed up the most used queries and the sequential cache entries only serve to clutter the cache and flush the useful pages. Is there any way that caching pages read in by a sequential scan could be made a configurable-option? Any chance someone could run pgbench on a test system set up to not cache sequential reads? Darren ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] SIGCHLD handler in Postgres C function.
On Mon, 30 Jul 2001, Tom Lane wrote: Bill Studenmund [EMAIL PROTECTED] writes: All ECHILD is doing is saying there was no child. Since we aren't really waiting for the child, I don't see how that's a problem. You're missing the point: on some platforms the system() call is returning a failure indication because of ECHILD. It's system() that's broken, not us, and the issue is how to work around its brokenness without sacrificing more error detection than we have to. I think I do get the point. But perhaps I didn't make my point well. :-) I think the problem is that on some OSs, setting SIGCHLD to SIG_IGN actually triggers automatic child reaping. So the problem is that we are: 1) setting SIGCHLD to SIG_IGN, 2) Calling system(), and 3) thinking ECHILD means something was really wrong. I think 4.4BSD systems will do what we expect (as the NO_CHLDWAIT flag requests child reaping), but linux systems will give us the ECHILD. Looking at source on the web, I found: kernel/signal.c:1042 * Note the silly behaviour of SIGCHLD: SIG_IGN means that the * signal isn't actually ignored, but does automatic child * reaping, while SIG_DFL is explicitly said by POSIX to force * the signal to be ignored. So we get automatic reaping on Linux systems (which isn't bad). If automatic reaping happens, system will give us an ECHILD as the waitpid (or equivalent) will not have found a child. :-) My suggestion is just leave the ifs as if ((error == 0) || (error == ECHLD)) (or the inverse). Take care, Bill ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] SIGCHLD handler in Postgres C function.
Bill Studenmund [EMAIL PROTECTED] writes: Looking at source on the web, I found: kernel/signal.c:1042 * Note the silly behaviour of SIGCHLD: SIG_IGN means that the * signal isn't actually ignored, but does automatic child * reaping, while SIG_DFL is explicitly said by POSIX to force * the signal to be ignored. Hmm, interesting. If you'll recall, the start of this thread was a proposal to change our backends' handling of SIGCHLD from SIG_IGN to SIG_DFL (and get rid of explicit tests for ECHILD). I didn't quite see why changing the handler should make a difference, but above we seem to have the smoking gun. Which kernel, and which version, is the above quote from? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] SIGCHLD handler in Postgres C function.
Bill Studenmund [EMAIL PROTECTED] writes: I see three choices: 1) Change back to SIG_DFL for normal behavior. I think this will be fine as we run w/o problem on systems that lack this behavior. If turning off automatic child reaping would cause a problem, we'd have seen it already on the OSs which don't automatically reap children. Will a backend ever fork after it's started? Backends never fork more backends --- but there are some places that launch transient children and wait for them to finish. A non-transient subprocess should always be launched by the postmaster, never by a backend, IMHO. 2) Change to DFL around system() and then change back. I think this is pretty ugly, and unnecessary. 3) Realize that ECHILD means that the child was auto-reaped (which is an ok think and, I think, will only happen if the child exited w/o error). That's the behavior that's in place now, but I do not like it. We should not need to code an assumption that this error isn't really an error --- especially when it only happens on some platforms. On a non-Linux kernel, an ECHILD failure really would be a failure, and the existing code would fail to detect that there was a problem. Bottom line: I like solution #1. Does anyone have an objection to it? regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[HACKERS] developer's website
The developer's corner will soon be going away. I'm in the process of putting together a developer's site. Different URL, different look, beta announcements will be there, regression database will be there, developement docs, etc. If you want a sneak preview: http://developer.postgresql.org/ Vince. -- == Vince Vielhaber -- KA8CSHemail: [EMAIL PROTECTED]http://www.pop4.net 56K Nationwide Dialup from $16.00/mo at Pop4 Networking Online Campground Directoryhttp://www.camping-usa.com Online Giftshop Superstorehttp://www.cloudninegifts.com == ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] SIGCHLD handler in Postgres C function.
On Mon, 30 Jul 2001, Tom Lane wrote: Bill Studenmund [EMAIL PROTECTED] writes: Looking at source on the web, I found: kernel/signal.c:1042 * Note the silly behaviour of SIGCHLD: SIG_IGN means that the * signal isn't actually ignored, but does automatic child * reaping, while SIG_DFL is explicitly said by POSIX to force * the signal to be ignored. Hmm, interesting. If you'll recall, the start of this thread was a proposal to change our backends' handling of SIGCHLD from SIG_IGN to SIG_DFL (and get rid of explicit tests for ECHILD). I didn't quite see why changing the handler should make a difference, but above we seem to have the smoking gun. Which kernel, and which version, is the above quote from? Linux kernel source, 2.4.3, I think i386 version (though it should be the same for this bit, it's supposed to be MI). Check out http://lxr.linux.no/source/ I do recall the reason for the thread. :-) I see three choices: 1) Change back to SIG_DFL for normal behavior. I think this will be fine as we run w/o problem on systems that lack this behavior. If turning off automatic child reaping would cause a problem, we'd have seen it already on the OSs which don't automatically reap children. Will a backend ever fork after it's started? 2) Change to DFL around system() and then change back. 3) Realize that ECHILD means that the child was auto-reaped (which is an ok think and, I think, will only happen if the child exited w/o error). Take care, Bill ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[HACKERS]
set digest
Re: [HACKERS] vacuumlo.
Tom Lane wrote: Grant [EMAIL PROTECTED] writes: Can you see a scenario where a programmer would forget to delete the data from pg_largeobject and the database becoming very large filled with orphaned large objects? Sure. My point wasn't that the functionality isn't needed, it's that I'm not sure vacuumlo does it well enough to be ready to promote to the status of mainstream code. It needs more review and testing before we can move it out of /contrib. IIRC vacuumlo doesn't take the type lo(see contrib/lo) into account. I'm suspicious if vacuumlo is reliable. regards, Hiroshi Inoue ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] vacuumlo.
Can you see a scenario where a programmer would forget to delete the data from pg_largeobject and the database becoming very large filled with orphaned large objects? Sure. My point wasn't that the functionality isn't needed, it's that I'm not sure vacuumlo does it well enough to be ready to promote to the status of mainstream code. It needs more review and testing before we can move it out of /contrib. IIRC vacuumlo doesn't take the type lo(see contrib/lo) into account. I'm suspicious if vacuumlo is reliable. This was my round about way of asking if something to combat this issue can be placed in the to do list. :) Thanks. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Re: Returned mail: User unknown
Is it just me or is an address on the hackers list who's mail is handled by wmail.metro.taejon.kr not existant? On Tue, 31 Jul 2001, Mail Delivery Subsystem wrote: The original message was received at Tue, 31 Jul 2001 14:25:00 +1000 (EST) from IDENT:[EMAIL PROTECTED] -- The following addresses had permanent fatal errors -- [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Re: Returned mail: User unknown
Grant [EMAIL PROTECTED] writes: Is it just me or is an address on the hackers list who's mail is handled by wmail.metro.taejon.kr not existant? I've had to institute a sendmail access block against that site :-( It bounces a useless complaint for every damn posting I make. What's worse is that it looks like it's trying to deliver extra copies to the people named in the To:/CC: lines --- if it somehow fails to fail to deliver those copies, it's spamming. Yo Marc, are you awake? These losers should be blocked from our lists permanently (or at least till they install less-broken mail software). regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
RE: [HACKERS] Performance TODO items
* Order duplicate index entries by tid In other words - add tid to index key: very old idea. I was thinking during index creation, it would be nice to order them by tid, but not do lots of work to keep it that way. I hear this not do lots of work so often from you -:) Days of simplicity are gone, Bruce. To continue, this project requires more and more complex solutions. * Add queue of backends waiting for spinlock We shouldn't mix two different approaches for different kinds of short-time internal locks - in one cases we need in light lmgr (when we're going to keep lock long enough, eg for IO) and in another cases we'd better to proceed with POSIX' mutex-es or semaphores instead of spinlocks. Queueing backends waiting for spinlock sounds like nonsense - how are you going to protect such queue? With spinlocks? -:) Yes, I guess so but hopefully we can spin waiting for the queue lock rather than sleep. We could use POSIX spinlocks/semaphores now but we don't because of performance, right? No. As long as no one proved with test that mutexes are bad for performance... Funny, such test would require ~ 1 day of work. Should we be spinning waiting for spinlock on multi-cpu machines? Is that the answer? What do you mean? Vadim ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Performance TODO items
* Order duplicate index entries by tid In other words - add tid to index key: very old idea. I was thinking during index creation, it would be nice to order them by tid, but not do lots of work to keep it that way. I hear this not do lots of work so often from you -:) Days of simplicity are gone, Bruce. To continue, this project requires more and more complex solutions. Yep. I can dream. :-) * Add queue of backends waiting for spinlock We shouldn't mix two different approaches for different kinds of short-time internal locks - in one cases we need in light lmgr (when we're going to keep lock long enough, eg for IO) and in another cases we'd better to proceed with POSIX' mutex-es or semaphores instead of spinlocks. Queueing backends waiting for spinlock sounds like nonsense - how are you going to protect such queue? With spinlocks? -:) Yes, I guess so but hopefully we can spin waiting for the queue lock rather than sleep. We could use POSIX spinlocks/semaphores now but we don't because of performance, right? No. As long as no one proved with test that mutexes are bad for performance... Funny, such test would require ~ 1 day of work. Good question. I know the number of function calls to spinlock stuff is huge. Seems real semaphores may be a big win on multi-cpu boxes. Should we be spinning waiting for spinlock on multi-cpu machines? Is that the answer? What do you mean? On a single-cpu machine, if you can't get the spinlock, you should just sleep and let the process who had it continue. On a multi-cpu machine, you perhaps should just keep trying, hoping that the process who holds it finishes soon. I wonder is we should find some kind of test on postmaster startup that would test to see if we have multiple cpu's and change from spinlock sleeping to spinlock spin-waiting. Add to this that fact we can't sleep for less than on clock tick on most machines, 10ms, and the spinlock stuff needs work. TODO updated: * Improve spinlock code, perhaps with OS semaphores, sleeper queue, or spining to obtain lock on multi-cpu systems -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
RE: [HACKERS] Performance TODO items
We could use POSIX spinlocks/semaphores now but we don't because of performance, right? No. As long as no one proved with test that mutexes are bad for performance... Funny, such test would require ~ 1 day of work. Good question. I know the number of function calls to spinlock stuff is huge. Seems real semaphores may be a big win on multi-cpu boxes. Ok, being tired of endless discussions I'll try to use mutexes instead of spinlocks and run pgbench on my Solaris WS 10 and E4500 (4 CPU) boxes. Vadim ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Re: From TODO, XML?
I have managed to get several XML files into PostgreSQL by writing a parser, and it is a huge hassle, the public parsers are too picky. I am thinking that a fuzzy parser, combined with some intelligence and an XML DTD reader, could make a very cool utility, one which I have not been able to find. Perhaps it is a two stage process? First pass creates a schema which can be modified/corrected, the second pass loads the data. Can we accept only relational XML. Does that buy us anything? Are the other database vendors outputting heirchical XML? Are they using foreign/primary keys to do it? Then what's the point? Almost no one creates a non-hierarchical XML. For the utility to be usefull, beyond just a different format for pg_dump, it has to deal with these issues and do the right thing. Oh, seems XML will be much more complicated than I thought. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] OU8...
Patch applied. Thanks. Still needs updated autoconf. Skip the patch for configure.in in that last one, use this in it's place (I missed one sysv5uw). -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 [ Attachment, skipping... ] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Re: Re: Storing XML in PostgreSQL
Added to /contrib, with small Makefile changes. Requires expat library. Does not compile by default. I've packaged up what I've done so far and you can find it at http://www.cabbage.uklinux.net/pgxml.tar.gz The TODO file included indicates what still remains to be done (a lot!). In particular, it would be good to implement more of the XPath grammar. However, once we get into the realm of more complex paths there becomes a question about optimisation of XPath selection. If the documents are pre-parsed, then XPath query elements can be rewritten as SQL queries and you get the optimisation of the planner on your side. I'd like to stick with the current solution if possible, because I think it delivers a very simple interface to the user and is (code-wise) also very straightforward. Maybe less efficient queries are a penalty worth paying? Any thoughts? Regards John ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] Re: From TODO, XML?
Bruce Momjian wrote: I have been fighting, for a while now, with idiot data vendors that think XML is a cure all. The problem is that XML is a hierarchical format where as SQL is a relational format. It would be good to get pg_dump to write an XML file and DTD, but getting external sources of XML into PostgreSQL is WAY more complicated. If an XML import is to be useful beyond just a different format for pg_dump, there has to be some intelligent database construction based on the XML information. Go to mp3.com, and download some of their XML format data, first, it is bad XML, second, it is hierarchical. I have managed to get several XML files into PostgreSQL by writing a parser, and it is a huge hassle, the public parsers are too picky. I am thinking that a fuzzy parser, combined with some intelligence and an XML DTD reader, could make a very cool utility, one which I have not been able to find. Perhaps it is a two stage process? First pass creates a schema which can be modified/corrected, the second pass loads the data. Can we accept only relational XML. Does that buy us anything? Are the other database vendors outputting heirchical XML? Are they using foreign/primary keys to do it? Then what's the point? Almost no one creates a non-hierarchical XML. For the utility to be usefull, beyond just a different format for pg_dump, it has to deal with these issues and do the right thing. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- 5-4-3-2-1 Thunderbirds are GO! http://www.mohawksoft.com ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Re: From TODO, XML?
On Mon, Jul 30, 2001 at 03:43:26PM +1000, Gavin Sherry wrote: On Mon, 30 Jul 2001, mlw wrote: I have had the same problem. The best XML parser I could find was the gnome-xml library at xmlsoft.org (libxml). I am currently using this in C What happen if you use DOM type of XML parser for large file? A dump from SQL DB can be realy large. IMHO is for this (data dump from SQL DB) is better SAX type of XML parser. an XSL stylesheet to transform an XML based database dump from some third Yes, it's right way how use XML. Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Performance TODO items
I have thought of a few new TODO performance items: 1) Someone at O'Reilly suggested that we order our duplicate index entries by tid so if we are hitting the heap for lots of duplicates, the hits will be on sequential pages. Seems like a nice idea. 2) After Tatsuo's report of running 1000 backends on pgbench and from a Solaris report, I think we should have a queue of backends waiting for a spinlock, rather than having them sleep and try again. This is particularly important for multi-processor machines. 3) I am reading the Solaris Internals book and there is mention of a free behind capability with large sequential scans. When a large sequential scan happens that would wipe out all the old cache entries, the kernel detects this and places its previous pages first on the free list. For out code, if we do a sequential scan of a table that is larger than our buffer cache size, I think we should detect this and do the same. See http://techdocs.postgresql.org for my performance paper for an example. New TODO entries are: * Order duplicate index entries by tid * Add queue of backends waiting for spinlock * Add free-behind capability for large sequential scans I will modify them with any comments people have. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Portal question
Hi all, This is the situation: You are doing a big query, but you want the results on the web page to be paginated. ie. The user can click page 1, 2, etc. So, you need know how many rows total would be returned, but you also only need a small fraction of them. What is an efficient way of doing this? It seems to me that using a CURSOR would be advantageous, however once a CURSOR is opened, how do you get the full row count? ie. Can you do this:? 1. Declare a cursor 2. Find the total number of rows returned 3. Fetch the subset of the rows that are required 4. Construct a pagination based on the info from 2 and 3. If this can't be done - how do you do it? Is the only way to repeat the whole query twice, the first time doing a count(*) instead of the select variables? Thanks, Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Performance TODO items
Bruce Momjian [EMAIL PROTECTED] writes: New TODO entries are: * Add queue of backends waiting for spinlock I already see: * Create spinlock sleepers queue so everyone doesn't wake up at once BTW, I agree with Vadim's opinion that we should add a new type of lock (intermediate between spinlocks and lockmanager locks) rather than try to add new semantics onto spinlocks. For example, it'd be very nice to distinguish read-only and read-write access in this new kind of lock, but we can't expect spinlocks to be able to do that. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Performance TODO items
Bruce Momjian [EMAIL PROTECTED] writes: Should we be spinning waiting for spinlock on multi-cpu machines? Is that the answer? A multi-CPU machine is actually the only case where a true spinlock *does* make sense. On a single CPU you might as well yield the CPU immediately, because you have no chance of getting the lock until the current holder is allowed to run again. On a multi CPU it's a reasonable guess that the current holder is running on one of the other CPUs and may release the lock soon (soon == less than a process dispatch cycle, hence busy-wait is better than release CPU). We are currently using spinlocks for a lot of situations where the mean time spent holding the lock is probably larger than soon as defined above. We should have a different lock implementation for those cases. True spinlocks should be reserved for protecting code where the maximum time spent holding the lock is guaranteed to be very short. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Performance TODO items
Bruce Momjian [EMAIL PROTECTED] writes: New TODO entries are: * Add queue of backends waiting for spinlock I already see: * Create spinlock sleepers queue so everyone doesn't wake up at once That is an old copy of the TODO. I reworded it. You will only see this now: * Improve spinlock code, perhaps with OS semaphores, sleeper queue, or spining to obtain lock on multi-cpu systems BTW, I agree with Vadim's opinion that we should add a new type of lock (intermediate between spinlocks and lockmanager locks) rather than try to add new semantics onto spinlocks. For example, it'd be very nice to distinguish read-only and read-write access in this new kind of lock, but we can't expect spinlocks to be able to do that. Yes, I agree too. On a uniprocessor machine, if I can't get the spinlock, I want to yield the cpu, ideally for less than 10ms. On a multi-cpu machine, if the lock is held by another CPU and that process is running, we want to spin waiting for the lock to free. If not, we can sleep. We basically need some more sophisticated semantics around these locks, or move to OS semaphores. In fact, can't we sleep on an interruptible system call, and send signals to processes when we release the lock? That way we don't have the 10ms sleep limitation. One idea is to have a bytes for each backend in shared memory and have each backend set the bit if it is waiting for the semaphore. There would be no contention with multiple backends registering their sleep at the same time. We have seen reports of 4-cpu systems having poor performance while the system is only 12% busy, perhaps because the processes are all sleeping waiting for the next tick. I think multi-cpu machines are going to give us new challenges. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Performance TODO items
Bruce Momjian [EMAIL PROTECTED] writes: I have thought of a few new TODO performance items: 1) Someone at O'Reilly suggested that we order our duplicate index entries by tid so if we are hitting the heap for lots of duplicates, the hits will be on sequential pages. Seems like a nice idea. A more general solution is for indexscan to collect up a bunch of TIDs from the index, sort them in-memory by TID order, and then probe into the heap with those TIDs. This is better than the above because you get nice ordering of the heap accesses across multiple key values, not just among the tuples with the same key. (In a unique or near-unique index, the above idea is nearly worthless.) In the best case there are few enough TIDs retrieved from the index that you can just do this once, but even if there are lots of TIDs, it should be a win to do this in batches of a few thousand TIDs. Essentially we decouple indexscans into separate index-access and heap-access phases. One big problem is that this doesn't interact well with concurrent VACUUM: our present solution for concurrent VACUUM assumes that indexscans hold a pin on an index page until they've finished fetching the pointed-to heap tuples. Another objection is that we'd have a harder time implementing the TODO item of marking an indextuple dead when its associated heaptuple is dead. Anyone see a way around these problems? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Performance TODO items
A more general solution is for indexscan to collect up a bunch of TIDs from the index, sort them in-memory by TID order, and then probe into the heap with those TIDs. This is better than the above because you get nice ordering of the heap accesses across multiple key values, not just among the tuples with the same key. (In a unique or near-unique index, the above idea is nearly worthless.) In the best case there are few enough TIDs retrieved from the index that you can just do this once, but even if there are lots of TIDs, it should be a win to do this in batches of a few thousand TIDs. Essentially we decouple indexscans into separate index-access and heap-access phases. One big problem is that this doesn't interact well with concurrent VACUUM: our present solution for concurrent VACUUM assumes that indexscans hold a pin on an index page until they've finished fetching the pointed-to heap tuples. Another objection is that we'd have a harder time implementing the TODO item of marking an indextuple dead when its associated heaptuple is dead. Anyone see a way around these problems? Interesting. I figured the cache could keep most pages in such a case. I was thinking more of helping file system readahead by requesting the earlier block first in a mult-block request. Not sure how valuable that would be. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
[HACKERS] Autoconf 2.50 is out (was Re: [PATCHES] Re: OpenUnix 8 Patch)
I wrote: If autoconf releases were happening on a regular basis, we could get away with just tracking the released version of autoconf for these files. However, they aren't and we can't. Just moments after writing that, I was startled to read on another mailing list that the long-mythical Autoconf 2.50 is released! We should probably consider updating from autoconf 2.13 as our project standard to 2.50. However, I'd recommend waiting till Peter E. returns from his vacation to see what his opinion about it is. IIRC, he's been following that project, which I have not been. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[HACKERS] Re: Autoconf 2.50 is out (was Re: [PATCHES] Re: OpenUnix 8 Patch)
* Tom Lane [EMAIL PROTECTED] [010730 18:34]: I wrote: If autoconf releases were happening on a regular basis, we could get away with just tracking the released version of autoconf for these files. However, they aren't and we can't. Just moments after writing that, I was startled to read on another mailing list that the long-mythical Autoconf 2.50 is released! We should probably consider updating from autoconf 2.13 as our project standard to 2.50. However, I'd recommend waiting till Peter E. returns from his vacation to see what his opinion about it is. IIRC, he's been following that project, which I have not been. I also see LOTS of complaints about compat issues, which is why I just pulled the 2 files from ftp://ftp.gnu.org/pub/gnu/config when I generated that patch. Looks like they updated them again today: $ ftp ftp.gnu.org Connected to gnuftp.gnu.org. 220 ProFTPD 1.2.0pre10 Server (ProFTPD) [gnuftp.gnu.org] 331 Anonymous login ok, send your complete e-mail address as password. 230-If you have any problems with the GNU software or its downloading, please refer your questions to [EMAIL PROTECTED]. There are several mirrors of this archive, a complete list can be found on http://www.gnu.org/order/ftp.html. Please use one of the mirrors if possible. Archives of the GNU mailing lists can be found in ftp://ftp-mailing-list-archives.gnu.org/. Please note that the directory structure on ftp.gnu.org was redisorganzied fairly recently, such that there is a directory for each program. One side effect of this is that if you cd into the gnu directory, and do ls emacs* you will get a list of all the files in the emacs directory, but it will not be obvious from the ls output that you have to `cd emacs' before you can download those files. Note further the non-GNU programs that were formerly in gnu/ have moved to gnu/non-gnu/. Most of them were just pointers in the format program.README. If you are looking for such a file, be sure to check gnu/non-gnu/. 230 Anonymous access granted, restrictions apply. Remote system type is UNIX. Using binary mode to transfer files. ftp cd pub/gnu/config 250 CWD command successful. ftp pwd 257 /gnu/config is current directory. ftp dir 200 PORT command successful. 150 Opening ASCII mode data connection for file list. -rw-r--r-- 1 ftp ftp 38214 Jul 30 14:00 config.guess -rw-r--r-- 1 ftp ftp 27872 Jul 30 14:00 config.sub 226-Transfer complete. 226 Quotas off ftp Larry regards, tom lane -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] SIGCHLD handler in Postgres C function.
Bruce Momjian [EMAIL PROTECTED] writes: The auto-reaping is standard SysV behavior, while BSD is really ignore. You'll recall the ECHILD exception was installed by Tatsuo after seeing problems on Solaris. Evidently Solaris uses the auto-reap behavior too. SVr4/Solaris took the SysV behavior. Steven's didn't like it. :-) I'm somewhat surprised that HPUX does not --- it tends to follow its SysV heritage when there's a conflict between that and BSD practice. Guess they went BSD on this one. I thought HPUX was mostly SysV tools on BSD kernel. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
RE: [HACKERS] Performance TODO items
New TODO entries are: * Order duplicate index entries by tid In other words - add tid to index key: very old idea. * Add queue of backends waiting for spinlock We shouldn't mix two different approaches for different kinds of short-time internal locks - in one cases we need in light lmgr (when we're going to keep lock long enough, eg for IO) and in another cases we'd better to proceed with POSIX' mutex-es or semaphores instead of spinlocks. Queueing backends waiting for spinlock sounds like nonsense - how are you going to protect such queue? With spinlocks? -:) Vadim ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] developer's website
The developer's corner will soon be going away. I'm in the process of putting together a developer's site. Different URL, different look, beta announcements will be there, regression database will be there, developement docs, etc. If you want a sneak preview: http://developer.postgresql.org/ Right now, a good part of what I mirror (and my traffic) for the web site are of the devel site. Will a third set of mirrors (www / ftp / devel) need to be setup, or will the main site handle all the load? - Brandon b. palmer, [EMAIL PROTECTED]pgp:crimelabs.net/bpalmer.pgp5 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] developer's website
On Mon, 30 Jul 2001, bpalmer wrote: The developer's corner will soon be going away. I'm in the process of putting together a developer's site. Different URL, different look, beta announcements will be there, regression database will be there, developement docs, etc. If you want a sneak preview: http://developer.postgresql.org/ Right now, a good part of what I mirror (and my traffic) for the web site are of the devel site. Will a third set of mirrors (www / ftp / devel) need to be setup, or will the main site handle all the load? Don't know just yet for sure. Vince. -- == Vince Vielhaber -- KA8CSHemail: [EMAIL PROTECTED]http://www.pop4.net 56K Nationwide Dialup from $16.00/mo at Pop4 Networking Online Campground Directoryhttp://www.camping-usa.com Online Giftshop Superstorehttp://www.cloudninegifts.com == ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] SIGCHLD handler in Postgres C function.
Bruce Momjian [EMAIL PROTECTED] writes: The auto-reaping is standard SysV behavior, while BSD is really ignore. You'll recall the ECHILD exception was installed by Tatsuo after seeing problems on Solaris. Evidently Solaris uses the auto-reap behavior too. I'm somewhat surprised that HPUX does not --- it tends to follow its SysV heritage when there's a conflict between that and BSD practice. Guess they went BSD on this one. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] SIGCHLD handler in Postgres C function.
Bill Studenmund [EMAIL PROTECTED] writes: Looking at source on the web, I found: kernel/signal.c:1042 * Note the silly behaviour of SIGCHLD: SIG_IGN means that the * signal isn't actually ignored, but does automatic child * reaping, while SIG_DFL is explicitly said by POSIX to force * the signal to be ignored. Hmm, interesting. If you'll recall, the start of this thread was a proposal to change our backends' handling of SIGCHLD from SIG_IGN to SIG_DFL (and get rid of explicit tests for ECHILD). I didn't quite see why changing the handler should make a difference, but above we seem to have the smoking gun. Which kernel, and which version, is the above quote from? The auto-reaping is standard SysV behavior, while BSD is really ignore. See the Steven's Unix Programming book for more info. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] vacuumlo.
Grant [EMAIL PROTECTED] writes: Can you see a scenario where a programmer would forget to delete the data from pg_largeobject and the database becoming very large filled with orphaned large objects? Sure. My point wasn't that the functionality isn't needed, it's that I'm not sure vacuumlo does it well enough to be ready to promote to the status of mainstream code. It needs more review and testing before we can move it out of /contrib. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
[HACKERS] vacuumlo.
I sent the email below to the creator of contrib/vacuumlo/ with no reply just yet. Is it possible to get his code included in the main vacuumdb program for support to vacuum orphaned large objects? Or... Any suggestions, what do people think? Thanks. -- Forwarded message -- Date: Tue, 24 Jul 2001 09:45:01 +1000 (EST) From: Grant [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: vacuumlo. G'day, I've recently discovered that LO's do not get deleted when a record referencing that OID is removed. I'm assuming you created the program and do you think it is possible to get this included as an argument for vacuumdb? vacuumdb -o db Or something along those lines so scanning for orphan large objects could be done from the main vacuum binary? What do you think? Thanks. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] SIGCHLD handler in Postgres C function.
Bruce Momjian [EMAIL PROTECTED] writes: I'm somewhat surprised that HPUX does not --- it tends to follow its SysV heritage when there's a conflict between that and BSD practice. Guess they went BSD on this one. I thought HPUX was mostly SysV tools on BSD kernel. No, it was all SysV (or maybe even older) to start with, and later on they adopted BSD features wholesale. But where there's a conflict, it's still mostly SysV. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] vacuumlo.
Grant [EMAIL PROTECTED] writes: Is it possible to get [vacuumlo] included in the main vacuumdb program for support to vacuum orphaned large objects? Hmm. I'm not convinced that vacuumlo is ready for prime time... in particular, how safe is it in the presence of concurrent transactions that might be adding or removing LOs? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] vacuumlo.
Is it possible to get [vacuumlo] included in the main vacuumdb program for support to vacuum orphaned large objects? Hmm. I'm not convinced that vacuumlo is ready for prime time... in particular, how safe is it in the presence of concurrent transactions that might be adding or removing LOs? I see large objects for each database are stored in pg_largeobject referenced by the loid. So when I delete a file from a table containing an oid type I have to make sure to delete the matching row(s) from pg_largeobject. Can you see a scenario where a programmer would forget to delete the data from pg_largeobject and the database becoming very large filled with orphaned large objects? Or am I on the wrong track? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster