Re: [PERFORM] Feature request: smarter use of conditional indexes
Given an index like this: CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL; and a query like this: SELECT * FROM t1 WHERE c1 = 123; I'd like the planner to be smart enough to use an index scan using i1. Send a patch ;-) The routine you want to teach about this is pred_test_simple_clause() in src/backend/optimizer/path/indxpath.c. ISTM that it's legitimate to conclude that "foo IS NOT NULL" is implied by "foo op anything" or "anything op foo" if the operator is marked strict. I've actually mentioned this one before in that of all the partial indexes I have, almost all of then are a WHERE x IS NOT NULL format. I don't know if that's a common use, but if it is, then maybe it's worth just adding the knowledge for IS NOT NULL... The other thing is that at the moment, cascading foreign keys will not use partial indexes even if they match the predicate. Maybe an IS NOT NULL hack will help there... Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Scaling further up
On 03/03/2004 18:23 scott.marlowe wrote: [snip] There are three factors that affect how fast you can get to the next sector: seek time settle time rotational latency Most drives only list the first, and don't bother to mention the other two. Ah yes, one of my (very) few still functioning brain cells was nagging about another bit of time in the equation :) On many modern drives, the seek times are around 5 to 10 milliseconds. [snip] Going back to the OPs posting about random_page_cost, imagine I have 2 servers identical in every way except the disk drive. Server A has a 10K rpm drive and server B has a 15K rpm drive. Seek/settle times aren't spectacularly different between the 2 drives. I'm wondering if drive B might actually merit a _higher_ random_page_cost than drive A as, once it gets settled on a disk track, it can suck the data off a lot faster. opinions/experiences anyone? -- Paul Thomas +--+-+ | Thomas Micro Systems Limited | Software Solutions for the Smaller Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +--+-+ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Feature request: smarter use of conditional indexes
John Siracusa <[EMAIL PROTECTED]> writes: > Given an index like this: > CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL; > and a query like this: > SELECT * FROM t1 WHERE c1 = 123; > I'd like the planner to be smart enough to use an index scan using i1. Send a patch ;-) The routine you want to teach about this is pred_test_simple_clause() in src/backend/optimizer/path/indxpath.c. ISTM that it's legitimate to conclude that "foo IS NOT NULL" is implied by "foo op anything" or "anything op foo" if the operator is marked strict. Note: please patch against CVS head, as that code got rewritten since 7.4. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] WAL Optimisation - configuration and usage
Simon Riggs wrote: On the other hand, I was just about to change the wal_debug behaviour to allow better debugging of PITR features as they're added. That's a development activity. Enabling the WAL_DEBUG #ifdef by default during the 7.5 development cycle would be uncontroversial, I think. I think it is very important to be able to put the system fairly easily into debug mode It is? Why would this be useful for non-development activities? (It may well be the case that we ought to report more or better information about the status of the WAL subsystem; but WAL_DEBUG is surely not the right mechanism for emitting information intended for an administrator.) -Neil ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] WAL Optimisation - configuration and usage
"Simon Riggs" <[EMAIL PROTECTED]> writes: > The behaviour I wish to add is: > Keep wal_debug as a value between 0 and 16. > If =0 then no debug output (default). > Use following bitmasks against the value > Mask 1 = XLOG Checkpoints get logged > Mask 2 = Archive API calls get logged > Mask 4 = Transaction - commits get logged > Mask 8 = Flush & INSERTs get logged I see no value in reverting Neil's change. The above looks way too much like old-line assembler-programmer thinking to me, anyway. Why not invent a separate, appropriately named boolean variable for each thing you want to control? Even C programmers manage to avoid doing the sort of mental arithmetic that the above would force onto DBAs. As for whether it should be #ifdef'd or not, I'd have no objection to turning WAL_DEBUG on by default in pg_config_manual.h for the duration of PITR development. One should not however confuse short-term debugging needs with features that the average user is going to need indefinitely. (It was not too long ago that there was still debugging code for btree index building in there, for crissakes.) regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] WAL Optimisation - configuration and usage
Josh Berkus wrote: Hmmm. I was told that it was this way for 7.4 as well; that's why it's in the docs that way. No such statement is made in the docs AFAIK: they merely say "If nonzero, turn on WAL-related debugging output." I invented a new #ifdef symbol when making this change in CVS HEAD, so I think you are misremembering. -Neil ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Bulk INSERT performance in 7.4.1
> "FW" == Florian Weimer <[EMAIL PROTECTED]> writes: FW> After increasing the number of checkpoint segments and the shared-memory FW> buffers, performance is back to the expected levels. It might even be a FW> bit faster. If you've got the time, could you try also doing the full bulk insert test with the checkpoint log files on another physical disk? See if that's any faster. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Scaling further up
On Wed, 3 Mar 2004, Paul Thomas wrote: > > On 02/03/2004 23:25 johnn wrote: > > [snip] > > random_page_cost should be set with the following things taken into > > account: > > - seek speed > > Which is not exactly the same thing as spindle speed as it's a combination > of spindle speed and track-to-track speed. I think you'll find that a 15K > rpm disk, whilst it will probably have a lower seek time than a 10K rpm > disk, won't have a proportionately (i.e., 2/3rds) lower seek time. There are three factors that affect how fast you can get to the next sector: seek time settle time rotational latency Most drives only list the first, and don't bother to mention the other two. On many modern drives, the seek times are around 5 to 10 milliseconds. The settle time varies as well. the longer the seek, the longer the settle, generally. This is the time it takes for the head to stop shaking and rest quietly over a particular track. Rotational Latency is the amount of time you have to wait, on average, for the sector you want to come under the heads. Assuming an 8 ms seek, and 2 ms settle (typical numbers), and that the rotational latency on average is 1/2 of a rotation: At 10k rpm, a rotation takes 1/166.667 of a second, or 6 mS. So, a half a rotation is approximately 3 mS. By going to a 15k rpm drive, the latency drops to 2 mS. So, if we add them up, on the same basic drive, one being 10k and one being 15k, we get: 10krpm: 8+2+3 = 13 mS 15krpm: 8+2+2 = 12 mS So, based on the decrease in rotational latency being the only advantage the 15krpm drive has over the 10krpm drive, we get an decrease in access time of only 1 mS, or only about an 8% decrease in actual seek time. So, if you're random page cost on 10krpm drives was 1.7, you'd need to drop it to 1.57 or so to reflect the speed increase from 15krpm drives. I.e. it's much more likely that going from 1 gig to 2 gigs of ram will make a noticeable difference than going from 10k to 15k drives. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] WAL Optimisation - configuration and usage
>Neil Conway > Simon Riggs wrote: > >>Josh Berkus wrote > >> > >>>Simon Riggs wrote > >>>Please set WAL_DEBUG to 1 so we can see a bit more info: thanks. > >> > >>I'm pretty sure that WAL_DEBUG requires a compile-time option. > > > > I'm surprised, but you are right, the manual does SAY this requires a > > compile time option; it is unfortunately not correct. > > Actually, the manual is correct: in 7.4 and earlier releases, enabling > wal_debug can be done without also setting a compile-time #ifdef. As > of current CVS HEAD, the WAL_DEBUG #ifdef must be defined before this > variable is available. Touche! I stand corrected, thank you both. My suggestion does work for Rob, then. [This also implies I have a screwed version on my machine, so thank you also for flushing that lurking issue out for me. I'd had a suspicion for a few weeks. Lucky I'm still just prototyping.] On the other hand, I was just about to change the wal_debug behaviour to allow better debugging of PITR features as they're added. I think it is very important to be able to put the system fairly easily into debug mode; a recompile is easy enough, but it would be even better to avoid this completely. This would mean reversing the change you describe: here's the design: The behaviour I wish to add is: Keep wal_debug as a value between 0 and 16. If =0 then no debug output (default). Use following bitmasks against the value Mask 1 = XLOG Checkpoints get logged Mask 2 = Archive API calls get logged Mask 4 = Transaction - commits get logged Mask 8 = Flush & INSERTs get logged That way it should be fairly straightforward to control the amount and type of information available to administrators. The existing design produces too much info to be easily usable, mostly requiring a perl program to filter out the info overload and do record counts. This suggested design allows you to control the volume of messages, since the bitmasks are arranged in volume/frequency order and brings the wal_debug option back into something useful for problem diagnosis on live systems, not just hacking the code. Anybody object to these mods, or have better/different ideas? Getting the diagnostics right is fairly important, IMHO, to making PITR become real. Best regards, Simon Riggs ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Bulk INSERT performance in 7.4.1
Would turning autocommit off help? Vivek Khera wrote: "FW" == Florian Weimer <[EMAIL PROTECTED]> writes: FW> After increasing the number of checkpoint segments and the shared-memory FW> buffers, performance is back to the expected levels. It might even be a FW> bit faster. If you've got the time, could you try also doing the full bulk insert test with the checkpoint log files on another physical disk? See if that's any faster. -- Greg Spiegelberg Sr. Product Development Engineer Cranel, Incorporated. Phone: 614.318.4314 Fax: 614.431.8388 Email: [EMAIL PROTECTED] Cranel. Technology. Integrity. Focus. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Bulk INSERT performance in 7.4.1
On Mar 3, 2004, at 4:37 PM, Greg Spiegelberg wrote: Would turning autocommit off help? doubtful, since the bulk insert is all one transaction. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] WAL Optimisation - configuration and usage
Neil, > Actually, the manual is correct: in 7.4 and earlier releases, enabling > wal_debug can be done without also setting a compile-time #ifdef. As > of current CVS HEAD, the WAL_DEBUG #ifdef must be defined before this > variable is available. Hmmm. I was told that it was this way for 7.4 as well; that's why it's in the docs that way. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Database Server Tuning
Vivek, > > With large RAID, have you found that having WAL on a seperate array > > actually > > boosts performance? The empirical tests we've seen so far don't seem > > to > > support this. > > Yes, it was a noticeable improvement. Do you have any stats? This would be useful for your talk, as well. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Scaling further up
John, > This would imply that an upgrade in drive RPM should be accompanied by > a decrease in random_page_cost, correct? Maybe. Maybe not. Tom's Hardware did some Bonnie++ testing with a variety of new drives last year. They were moderately surprised to find that there were "faster" drives (i.e. higher RPM) which had lower real throughput due to poor onboard software and hardware, such as a small and slow onboard cache. So, it would be reasonable to assume that a 10,000 RPM Barracuda could support marginally lower random_page_cost than a 7,200 RPM Barracuda ... but that tells you nothing about a 10,000 RPM Maxtor Diamond (as an example). Also, many other factors influence real random_page_cost; the size and access pattern of your database is probably much more important than your RPM. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Bulk INSERT performance in 7.4.1
Florian Weimer wrote: > After an upgrade to 7.4.1 (from 7.3) we see a severe performance > regression in bulk INSERTs. In turns out that we were running the default configuration, and not the tuned one in /etc/postgresql. *blush* After increasing the number of checkpoint segments and the shared-memory buffers, performance is back to the expected levels. It might even be a bit faster. -- Current mail filters: many dial-up/DSL/cable modem hosts, and the following domains: atlas.cz, bigpond.com, freenet.de, hotmail.com, libero.it, netscape.net, postino.it, tiscali.co.uk, tiscali.cz, tiscali.it, voila.fr, wanadoo.fr, yahoo.com. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Scaling further up
On Tue, Mar 02, 2004 at 04:50:04PM -0500, Anjan Dave wrote: > time/resources to do extensive testing, I am not sure if > Postgres/Solaris9 is really suggested by the community for > high-performance, as opposed to a XEON/Linux setup. Storage being a > separate discussion. I can tell you from experience that performance on Solaris is nowhere close to what you'd expect, given the coin you're forking over for it. I think the reason to use Solaris is its support for all the nifty hot-swappable hardware, and not for its speed or any putative benefit you might get from having 64 bits at your disposal. A -- Andrew Sullivan | [EMAIL PROTECTED] The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Bulk INSERT performance in 7.4.1
After an upgrade to 7.4.1 (from 7.3) we see a severe performance regression in bulk INSERTs. This is apparently caused by constant checkpointing (every 10 to 20 seconds). I've already increased the number of checkpoint segments to 32, but currently, there are just 10 or 11 files in the pg_xlog directory. With 7.3, we had configured checkpoint_segements at 16, and there were 33 pg_xlog files. Checkpoints happened every couple of minutes. How can I reduce the checkpoint frequency? (I'd like to try that first because it's the most obvious anomaly. Maybe we can look at the involved table later.) -- Current mail filters: many dial-up/DSL/cable modem hosts, and the following domains: atlas.cz, bigpond.com, freenet.de, hotmail.com, libero.it, netscape.net, postino.it, tiscali.co.uk, tiscali.cz, tiscali.it, voila.fr, wanadoo.fr, yahoo.com. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Scaling further up
On 02/03/2004 23:25 johnn wrote: [snip] random_page_cost should be set with the following things taken into account: - seek speed Which is not exactly the same thing as spindle speed as it's a combination of spindle speed and track-to-track speed. I think you'll find that a 15K rpm disk, whilst it will probably have a lower seek time than a 10K rpm disk, won't have a proportionately (i.e., 2/3rds) lower seek time. - likelihood of page to be cached in memory by the kernel That's effective cache size. -- Paul Thomas +--+-+ | Thomas Micro Systems Limited | Software Solutions for the Smaller Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +--+-+ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])